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Chapter 1. Introduction 


In this thesis we are concerned with issues arising from the need to achieve 
concurrency of operation within a computation on a large scale. Several factors contribute 
toward increasing interest in systems capable of exploiting the concurrency of computation. . 
Concurrency provides the potential for performance improvement through concurrent operation 
of hardware components such as preter and memory modules. This results in better 
utilization of total resources and in faster response if a computation has a high level of 
concurrency. The dramatic progress of technology has made concurrent systems more 
attractive as an alternative for high performance systems. In particular, systems that have many 
replicated hardware modules can take advantage of the projected potential of the processing 
capability of a single chip device which can be very eonomicly produced. Such systems may 
further oe better pup eran: capability and extendability of snort performance 

So far, concurrent programming has, not been adequately deait with in conventional 
‘programming languages. It is our belief that future systems must depart from the prevalent 
view of somentiat contputation both at the programming language level and at the machine 
organization level if a substantial progress is to be made toward practical large concurrent 
systems. 

The goal of this thesis is to demonstrate that an adequate computation model can 
provide a basis both for a good ba iia language and for an architecture that can fully 
exploit the inherent concurrency in algorithms expresed in the language. To this end, we show 
how a value-oriented language can be implemented based on a model of concurrent 
computation known as data flow schemas [DenFo73) and how this implementation can guide the 
design of an architecture that achieves a high level of concurrent Operations. 


The model-of computation is based on the notion of data driven computation, in the 


sense that an operation in a computation is executed as soon as aff of the required operands 
become available. Thus, there is no notion of sequential control of execution. Data flow 
schemas allow many concurrent subcomputations to take place without creating side-effects. 
The lack of side-effects is essential for several reasons. First, the existence of side-effects among 
concurrent processes may cause the outcome of the computation to ‘be dependent on the order in 
which the processes are eee -- that is, the computation is nondeterminate. In most 
applications, it is desirable to achieve concurrent operation white preserving the mnqueness of 
the result of the computation. From the semantic point of view, a language that is free of 
side-effects is easily formalized using denotational semantics (stoyr™ Furthermore, when a 
‘conipiutation is expressed in a  sideeffect tree language, concurrency in the computation is easily 
recognized as subcomputations which do not depend on — of other et subcomputation re and 
this data dependency is manifest in the proenem structure, 

We introduce a simple value-oriented. “anguage that has two important features: 
streams which are sequences of values communicated between computations, and forall 
constructs in which one can express concurrent operations on components of data structures. A 
computation expressed in this language is guaranteed determinate ‘unless explic forms of 
nondeterminacy are used. In this thesis, we consider a limited eure of nondeterminacy that 
merges two sequences of values in a nondeterminate manner. We discuss mitations of the 
language in Section I. 

The architecture presented in this thesis is based on a form of data flow processor 
proposed by Dennis and Misunas [DenMi, Misun78] We show how the language can be 
effectively implemented on ) this architecture ‘such that concurrency of a ‘computation can be 
exploited. The main extension includes suggestions for the design of the storage of a ‘large 
number of activations of procedures and data structures such that contentions in iiss data 


structures ¢ can be alleviated. 


In the next two sections, we give a brief discussion of computer systems designed for 
achieving highly concurrent operations and programming language fer expressing concurrent 


computations. Section 1.3 explains the data flow concept. 


1.1 Concurrent Systems 

Many computing systems (Kuck77, YauFu77, Enslo77) have departed from conventional 
computer csatiations to improve the capability for concurrent execution. A class of such 
processors belong to the category of SIMD (Single Instruction Multiple Data) machines 
(Flynn72]. For instance, there are array processors represented by the ILLIAC IV [Barns68], 
associative processors like the STARAN [Batch74], and vector processors such as the CDC 
STAR 100 [Hintz72]. These processors perform well only when the computation can be 
expressed in program and data structures which are easily mapped onto the particular machine 
_ Structures. Array processors require that data structures be mapped onto a fixed structure 
imposed by the physical arrangement of the processors, such as a two. dimensional array. 
Associative processors require that data structures be linear tists of words so that associative 
operations on parts of these words can be efficient. For vector processors, data structures must 
be in the form of one-dimensional arrays to allow pipelining of operations on successive array 
elements. Furthermore, programs must exhibit a high degree of locality of reference such that a 
Significant amount of data structure encteineat is not necessary during the execution. This 
dependence on locality of reference arises because the performance is achieved by short 
instruction execution delays and by special pipelined execution units or by many tightly 
synchronized independent execution units. | | 

Unfortunately, the class of computations having these properties. is rather limited; 
hence, much effort has been devoted to transforming programs -- either by the application 


programmer or by compilers -- so that efficient execution can be achieved [Lampo74, Kuck77]. 


In fact, even in the limited domain of numerical computations for which these processors are 
' designed (or intended), there is a high degree of irregularity in. computations so that these 
processors can not easily achieve their potential performance! 

The strong dependence on locality of reference and special features such as vector 
instructions inevitably tempts the programmers to be explicitly aware of the hardware features 
of the processors. This awareness often leads to programming errors due to concern with the 
optimization of programs. In this sense, these processors share the common problems that the 
programming issues are neglected and that the performance can neither be readily extended by | 
introducing more execution units nor by moving from a processor of one configuration to 
another without a substantial. amount of effort in program conversion. | 

There are concurrent processors that belong to category of MIMD (Mukiple Instruction 
Multiple Data) machines. A typical realization of this.form. of machines is based on mukiple 
processor and shared multiple memory organization. Examples of such processors are Pluribus 
[Orns:75], C.mmp [WulBe72], and CMe (SwFuS7713 The predominant problem of these 
processors is that the system performance is based on the assumption of locality of reference 
achieved by programmers’ explicit partitioning of a computation. Furthermore, because the 
semantics. of the languages supported by these systems are based on the notion of sequential 
execution and operations which have side-effects, concurrency. is-achieved through careful 
analysis of programs to prevent possible deadlocks.and bottlenecks in. memory references. 


|. We refer the reader to [KisRu75) for an example of how program mixtures have affected 
the performance of one of these processors. It is interesting to note that the CRAY computer 
[RamLi77] is designed with more recognition of this: fact than: previous vector computers by 
improving operations on vectors of short length. 

2. Note, however, that the difficulty of transporting software among different systems is a 
pervasive problem of existing systems as well. 

3. We refer readers to [Enslo77] for a more detailed discussion on machines based on muktiple 
processor organizations. “a 


12 Concurrent Programming Languages 

Yet, what is a good concurrent programming language? There are two essential 
properties of a program: correctness and performance. The motivation behind structured 
programming is a consequence of the concern over the difficulty. of establishing correctness of 
programs and of improving the productivity of the programming task. The task of concurrent 
programming, however, is much more difficult than that of sequential programming because 
the existence of concurrency makes any interaction between concurrent processes nontrivial. It 
Should, therefore, be an essential design objective of a concurrent programming language to 
have the property that unnecessary programming difficulty is not introduced to improve the 
concurrency exhibited by programs. . 

There are several concepts which are unique to concurrent computations. In the 
execution of many concurrent processes, it is possible that the order in which the operations are 
performed affects the outcome of the computation, Such computations are said to be 
nondeterminate. Conversely, a computation whose result are guaranteed to be the same when 
the set of concurrent subprocesses are executed in any allowable order is said to be determinate! 
Since many concurrently running processes may depend on the results of or synchronization by 
other processes, it is possible that a set of processes may. become simultaneously dependent on 
the results of each other. If none of the processes can proceed further, then the set of processes 
are said to be in deadlock. Deadlocks occur in many forms depending on the possible situations 
which can arise to prevent a process from being able to praceed. The purest form of deadlock 
is. that the computation itself can run into deadlock even if the amount of computational 


resources is infinite. In this case, what causes deadlocks is the semantics of the computation 


1. A computation which contains nondeterminate subcomputations may itself be determinate. 
Thus, the class of computations expressible with operations which cannot introduce 
nondeterminacy is strictly contained by the class of determinate computations. 


rather than the manner in which resources are allocated. 

We now give a historical perspective of the problems of various approaches to 
concurrent programming, then outline in Section 13 an approach we feel may alleviate these 
problems and is followed in this thesis. 

A natural development of concurrent pregramming has been to extend the existing 
semantic basis to incude explicit process contro! primitives. An example is the introduction of 
call and wait primitives of PL/I which provide ‘explicit control over the creation and 
resumption of processes.. The coordination of concurrent processes is achieved by additional 
control primitives which interrupt and resume the control of an process with explicitly specified 
signals and with conditions which dictate when the contro! ofa process may be influenced by 
Signals from other processes. Another approach Uses mechanisms such as semaphores and P 
and V primitives to coordinate these processes [Dijks68). 

These forms of concurrent programming are at too fow a fevel of abstraction to be 
good programming constructs in several ways. 

It is often the case that a given computation when expressed in different sets of 
primitives resufts in quite different program structures. These differences arise not from the 
conceptual scheme of the computation but rather from the explicit controt mechanisms that must 
be used. 

Another consequence is that programmers tend to become very aware of the efficiency 
of the mechanisms. For instance, the cost of creating and controfling a process is often 
_ prohibitively high due to the inherent complexity of the semantics of these programming 
languages. The programming task is, therefore, further impeded because users often create 
processes with explicit concerns over resource management. (This, in a sense, is analogous to 
the situation when programmers had to be explicitly aware of the memory management in 
writing large programs before use of automatic memory management and because. common 
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practice.) 

In many situations, one finds that the computation is inherently determinate, but the 
program expressed in these forms is non-determinate in the presence of programming errors. 
Thus, there is no way to ensure determinacy when it is desirable. Tests-or proofs for the 
program behavior are, therefore, unnecessarily complex, since the possible outcome of a 
computation is a set whose size depends greatly onthe number ‘of interacting processes. 
Furthermore, even in the presence of desired nondeterminacy, none of the individual 
subprograms can be validated independently. This deficiency for independent validation is 
attributable not only to the semantics of these primitives but also to the use of global variables 
that many concurrent processes can access and modify. 

More recent approaches for concurrent programming emphasize the ease of validation 
of correctness for concurrent programs. Examples of language constructs using these 
approaches are monitors [Hoare74], path expressions fLauCa7$], and guarded commands 
(Dijks75]. Note that these constructs are defined in conjunction with restricted use of variables 
and the flow of control. These represent steps toward a more'structured and higher level of 
concurrent programming. A common feature of these approaches, however, is that concurrency 
is created explicitly with constructs such as the cobégin block or the guarded command biocks. 
Thus, the concurrency expressed is at the level of processes rather than at the level of 


operations where a substantial amount of concurrency also exists. 


13 Data Flow Concept 

Developments in the theory of parallel computation have motivated a computation 
model called data flow schemas [DenFo73]. This model is one of many models.[Fosse72, Kosin73, 
ArvGo77] based on the data flow concept. The model represents a computation only in terms of 


data dependencies between instructions, and reveals inherent parallelism without unnecessary 


constraints on instruction sequencing imposed by the conventional machine level 


representations. 


13.8 Data Flow Languages 

Because the data flow model is graphical in nature, numerous studies [Denns74, 
ArGoP77, Rumba75, Kosin73, Weng75) have attempted to define textual programming 
languages based on these models. While it is possible to define an algorithm that transforms 
programs written in existing sequential programming languages into.data flow schemas, such an 
algorithm is complex because of the semantics of the sequential programming languages. 
Furthermore, the inherent concurrency of a computation is often hidden from the translator 
because there are additional constraints that are builtin in the expressiveness of sequential 
programming languages! We believe that high level data flow programming languages will. 
allow algorithms for concurrent computation to be easily expressible. 

Programming languages based on the data flow concept are sufficiently expressive. to | 
encompass conventional programming language constructs such as iterations, while-loops, 
conditionals, procedures, and data types such as data structures and procedure values. These 
constructs, however, are embedded in a semantics which. is free of both side-effects and the 
sequential control of execution. The distinctive lack of control transfer primitives such as 
GOTO's and operations which introduce side-effects allows compilers to easily detect data 
dependencies between operations in a program. Languages with these characteristics have been 
shown to have simple denotational formal semantics [Stoy74, Brock78] . | 


I. This phenomenon is a well known fact among researchers working on optimizing compiters 
both for sequential processors and concurrent processors. For example, use of array indexing 
and common variables in large Fortran programs makes many on ‘ifficuk if not 


impossible. 


Additional features such as forall constructs, primitives. for stream values, and 
constructs for nondeterminate computations are found to be natural. extensions to these 
languages. The forall constructs allow programmers to specify concurrent operations on all 
components of a data structure. The notion of stream. provides an alternative to the use of 
coroutines and synchronization primitives for expressing computations passing sequences of 
values among their component modules. | 

_ A very important characteristic of these: languages is that the determinacy of a 
computation is guaranteed when the computation is expressed. not using primitives or features 
explicitly provided for situations where .non-determinacy. is required. . In conventional 
languages, nondeterminate computations are-expressed. using semaphore primitives, call and 
wait primitives, and monitors. The semantics of these: primitives, however, are not consistent 
with the semantics of data -flow tanigunges: -Becaane Shere are significant applications where 
nondeterminacy is necessary, the formal semantics of languages. with non-determinate. primitives 
is still an active area of research [Plotk?6, Milne?9, Kelle77]. . In this thesis, we have chosen a 
very primitive form of nondeterminacy which seems essential.as a basis for higher level 


constructs for nondeterminate computations. 


13.2 Data Flow Processors 


Data flow schemas are not only a suitable vehicle for representing concurrent 
computations but also provide a simple operational semantics which has suggested several new 
computer architecture designs. Another characteristic of the model which is not often cited is 

‘that data flow graphs are very flexible bases for machine level representations. These 
enteseanone if applicable to a wide class of computer architectures, including architectures 
extended from conventional processor and memory organizations. | 


The common characteristic of all data flow processors is the use of some machine level 


representation of data flow graphs. Assuming that a data flow program already resides in a 
processor, its execution requires mechanisms for 

(1) detection of conditions for an instruction to be executable, 

(2) execution of the instruction, and 

(3) transmission of the result to the instructions requiring it as-an operand. 

The processor proposed by Dennis and Misunas {Dens Misun75} consists of five 
sections: Instruction Memory, Arbitration Memory, Functional Uni, Distribution Network, and 
Packet Memory. The Instruction -Memery stores the-machine level representation of a data flow 
graph so that enabled instructions can be independently detected und dent te the: Arbitration 
Network as operation packets. The Arbitration and: -Diaribution Networks are packet 
switching networks. The Functional Unit processes‘ operation packels-in a pipelined fashion. 
The Packet Memory perferms data structure operations and: méthory -¢nanagement. The mest 
distinguishing characteristic of the processor is that its performance is not derived: from any 
assumption about the locality of allocations of thie rogeené thus; the program execation is not 
dependent on ‘where each ‘instruction resides. Different assumptions about the fecality of 
computations result in great differences in the architectures of concarrent' procetsers. ‘White it is 
often the case that a computation exhibits locality of reference! it has not been demonstrated 
that concurrent processors taking advantages of this fact are mot subject to significant 
"performance degradation when this ‘assumption is violated by parts of 3 computation. 


1. In particular, Swan has observed that references to the codes of of procedures represent a large | 
portion of memory references and exhibit high degree of locality of reference [Swan78). 


_ 14 Scope of the thesis 


In this thesis we present an implementation of a programming language on an 
extended form of the Dennis-Misunas architecture. The extension includes storage of procedure 
activations, stream values, and data structures in the Packet Memory and we suggest a way to 
perform memory management for copies of data structures. | 

We chose a well defined programming language as the basis for extending the 
capability of the processor. This language has features which allow concurrency to be 
expressed in two forms and still guarantees that the computation is determinate and 
deadlock-free regardless of programming errors. The first form is based on procedure 
activations which automatically create concurrently executable procedure instances: this is the 
most familiar form of concurrency. The second form is based on the notion of stream 
computations (or, pipeline computations in some sense): this forth of concurrency is frequently 
seen in large software such as compilers or in many operating system functions such as 
input/output which are often expressed either in the form of coroutines or in the form of 
coordinated processes [Conwa63, Mclir68, Hoare78]. 

Based: on the notion of stream computations we provide a primitive for expressing 
nondeterminate computations by merging two streams of values in a first-in-first-out manner. 
Though this is a very primitive language construct, we feel it is ani essential low level primitive 
for implementing other forms of noiditerminate constructs. 

There are two ways in which some reader may consider our language limited. The 
language does not provide any construct-for expressing a set of concurrent processes whose 
communication path forms a cyclic structure. This limitation is due to the general belief that 
the deadlock property for such a set of processes is not decidable in general at compile time. 
The second limitation is that we have not included procedures as values. This is because we 


have not found a satisfactory solution to its imptementation: The data structures in the 


language do not include any cyclic data structures such as doubly linked lists or cyclic graphs. 
The extension of the language to include such: structures can be based on the notion of 
immutable objects which contain cyclic structures such thatthe semantics of the data structures 
is free of side-effects [Hende75] This is an interesting issue of both practical. and theoretical 
importance that we have not been able to scrutinize in depth in this thesis. 


1.4.1 Related Work 
The model. of data flow computation Propeaet | by. Arvind and Gostelow [ArvGo77] is 


based on an interpreter that. is quite different. in. philesc Y. from data. flow schemas. The 


model does not introduce the notion of arcs.of finite bultering on. which data flow schemas are 
based, and results in an architecture. different. from the Dennis-Misunas architecture. Other. 
data flow research include the ,DDMI! model by Davis [Davis78), the model by Kosinski 
[Kosin73], the graph model introduced at UCLA [BaBoE WL the LAU system {SyCoH77], Gurd 
and Watson [GurWa77], and Treteaven (T releTT}, {This list, is by ng.means ws comet! 

concept within languages which have side-effect-free semantics ne eel 
[Berki75] and FFP systems (Backu78], and LISP based systems OFriWi76, KePaL78}! These 


More recently, many workers have begun to inte qn 


languages. -have a different approach toward... copgurren and may. have 


interpreters whose operations are highly concurrent due. to. the side-effect-free. nature of the 


languages. —— tives ot 
- In analyzing the structures of data flow processors, we can define two classes of 
processor organizations in the broad spectrum of possible structures that have been proposed. . 


1. The Actor mans ew] bret on the menage ping aye of programming also 
as a 


The first class consists of processors that have a large set of homogeneous modules 
connected by a network. Each module has a functional unit and a focal memory, and all 
executable instructions are performed within the modute. Processors of the second class do not 
have uniformly identical modules, and each module is specialized to perform a particular 
function, such as detection of executable instructions, execution of scalar operations, or switching. 
of packets containing instructions or data. The types of networks for both classes range from 
simple bus structures to routing networks for handling packets: of varying lengths. These ) 
networks are not intended for performing communications over a.véry long distance and 
therefore may not directly imply that the processors naturally. extend to geographically 
distributed systems! 

The processors proposed by Davis, Syre, and Arvind and Gostelow can be considered 
to belong in the first class, and the second class is represented by the processor proposed by 
Dennis and Misunas. | 

Davis has proposed a hierarchical processor structure similar to a ‘re in which each 
processor module is allowed to communicate with its parent and a fixed number of. child 
modules. Each module is capable of storing large segments of data flow programs and of 
partitioning a segment into subsegments which are sent to child processors as concurrently 
executable subcomputations. Because of its tree-like structure, this processor has the potential 
problem of unbalanced utilization of modules. . The partitioning of a computation can also 
result in communication problems, since communication between child modules ts made through 
parent modules. It has been proposed that these difficulties may be overcome by additional 


connections between leaf nodes of the tree. 


1. The problems of detection and recovery from faulty communication links or processors, and 
those of resource managements are but some of the issues that are nen emphasized in 
distributed systems. 


Syre proposes a bus-ortented network connecting a set of madules, each. of which has a 
special contret mechanism for detecting executable instructions. The allocation: of processes to 
the modules is partly performed by the compiler that ;preprocesses each procedure by dividing it 
into segments for easy allocation of resources at run time: Some information needed for compe 
time allocation are supplied by-the prograramer in the high tevel language program. The bus 
network is adequate for connecting a limited number of modules, but is not extendable to a 
much larger numbers of modutes.. . 

Arvind and Gestelow propose a. ring ‘network containing a number of ring interfaces 
each of which connects'to a set of modules through a bus. Each-set of these modules also share 
a memory controller which provides accesses and movements of data between. a modute and the 
large memory. .% | 

The main differences between our processor and these processors are the Packet 
Memory which is needed for general purpose computations and the assumption about the _ 
requirements of the networks. It is not clear hew streams and foeeifs can be effectively 
implemented: on these processors. 


14.2 Hard problems 

It remains untested whether programming languages based. on the notion of data flow. 
or the notion of side-effect-free semantics are applicable $0: designing ‘counterparts of 
conventional operating sytem functions and te. various. teclmlques.9f heuristic. prograreming 
found in the area of artificial intelligence. . 

For any system that is capable of creating a large number of concurrent activities, there 
are several inherently hard problems that need be solved. The most critical problem is the 
resource management which must not ony be efficient for allocating process but alo provide 
mechanisms for controlling concurrent activities so the system is not averwhelened by an 


excessive number of activities. For systems that intend to support a wide range of applications, 
it may ‘be necessary to provide mechanisms for aborting a computation which might never 
terminate or whose results are known not to be needed. For sequential computers, these 
mechanisms are supported by controlling the process states in the process queue of the system; 
but, for a highly concurrent system where activities may spread over a farge number of 
hardware resources, it is not clear how these functions can be supported without degrading the 
performance promised by concurrency. For programming languages which can. express 
computations that may result in deadlocks due to mutual data dependenctes among processes, it 
is also necessary to have the above mentioned mechanisms. 

It is important to realize that the limited nature of the scope of this thesis is due to our 
lack of understanding of the above problems and lack of simple solutions to:'them. It should be 
of great interest to readers to examine various proposed systems which exhibit a high degree of 
concurrency, whether they are based on the data flow concept or not, with awareness of these 
problems. : | | 


- 15 Synopsis . 

In Chapter 2, we present data flow schemas for completeness. This chapter also 7 
includes a short introduction to data structures. We have excluded data flow schemas which 
correspond to language constructs such as while-loops, and instead, we ies recursions as an 
equivalent form of such constructs. 

Chapter 3 introduces a simple programming language which is value-oriented. This 
language demonstrates that a clean programming language can be defined and translated into 
data flow schemas presented in Chapter 2. The procedure names in the language are globally 
defined. We include a discussion on issues related to extending the language for defining focal 


procedures and handling procedure values. 


Chapter 4 shows how an encoding of data flow schemas.can be defined. We give a 
short introduction to the structure of the data flow processor and how the representation of 
encoded data flow schemas can be used to implement procedure activations. 2 

Chapter 5 introduces the data flow operators that allow .expression of stream 
computations in a natural manner. The straightforward implementation of streams based on 
these operators is very inefficient, therefore, we show. an. implementation. of streams that is. based 
on the notion of Aoles. We abo introduce a primitive that nondeterminately merges. two 
streams. We. describes how several. limited forms of, forall constructs can be translated into 
recursive forms which exploit the concurrency in.a natural. manner. 

Chapter 6 shows how resources for storing procedure.activations can be allocated and - 
supported. We show how simultaneous accesses..to data. structures can be handled ina 
multi-port and multi-cache memory organluation, while implementing s4ference count memory 
management. aan a ee : 
Conchuding remarks and directions of further research are in Chapter 7. 
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Chapter 2. Data Flow Schemas 


In this chapter we introduce an operational model for concurrent computation that has 
evolved from many similar graphical operational models used for studying the properties of 
concurrent computation. The earliest models were pioneered by Adams, Karp and Miller, and 
Rodriguez [Adams68, KarMi66, Rodri69]. These models were intended for investigating the 
decidability of properties of concurrent computations such as deadlocks, nondeterminacy, 
equivalences of program graphs, and comparative power for expressing concurrent computation. 

Later works [Denns74, Kosin73, ArvGo77] are more oriented toward defining 
operational models as a basis for programming concurrent computations, and as a basis for 
investigating the degree of concurrency achievable. We are interested in the Data Flow Schema 
proposed by Dennis and Fosseen [DenFo73], because this model has evolved-to the point that 
we are able to express naturally most language features found in existing high level 
programming languages. Furthermore, this model guarantees that computations expressed in 
the model are determinate while exhibiting a high degree of concurrency. We present a slightly 
modified version of the data flow schemas that does not have cyclic schemas and allows 


recursions. 


2.1 Recursive data flow schemas 
The data flow schema is an operational model of computation and consists of a graph 
representation and an interpreter which operates on the representation. A data flow schema is 
a directed graph whose nodes are actors connected by directed arcs. An arc pointing to an actor 
_is called an input arc of the actor, and an output arc is an arc emanating from the actor. Each 
actor has an ordered set of input arcs and output arcs. There are five types of actors: link, 


operator, switch, merge and sink. The five types of actors are shown in Figure 2.1. An (m, 7) 
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(1) link (4) merge 
(2) operator (5) sink 


(3) switch 


data input 


ay contro} input 


Figure 2.1. Data Flow Actors 


data flow schema must have m link’s which do not have input arcs, and n link’s not having 
output arcs. These link’s are respectively called internal input link’s and internal output link’s 
of the (m, n) schema. Further, we require that the schema must be proper in the sense that all 
other actors must have the arcs required of each type and each arc must be connected at both 
ends. 

Description of the operational semantics of data flow schemas requires additional 
concepts: availability of data at the inputs and firing rules that define how a computation 
proceeds. A configuration of a data flow schema is the graph of the schema together with an 
assignment of labeled tokens to some arcs of the graph. An assignment of a token to an arc is 
represented by the presence of a solid disk on an arc. The label denotes the value carried by 
the token and may be omitted when the value is irrelevant to our presentation. Informally, the 
presence of a token on an arc means that a value is made available to the actor to which the 
arc points. In this thesis, we shall assume that the tokens carry values of types integer, real, 
boolean, or structure. | . 

To describe a computation of an application of an (m, n) schema to some input values, 
we introduce the notion of snapshots: a snapshot consists of a configuration connected to a set of 
input and output arcs as shown in Figure 2.2. The computation of a data flow schema when 
applied to a set of input values is described by a sequence of snapshots. The initial snapshot 
consists of the graph shown in Figure 2.2 and an initial configuration which only has tokens on 
the input arcs as inputs to the computation. The computation advances from one snapshot to 
the next through the firing of some cee that is enabled in the previous. snapshot. The 
condition under which an actor is enabled is depicted in Figure 2.3. It should be noted that a 


necessary condition for any actor to be enabled is that each output arc does not hold a token. 


(a) Initial Snapshot 


m 


An (m, n) Schema 
with no enabled actors 


(b) Final Snapshot 


we 


An (m, n) Schema 
with no enabled actors 


Un 


ve 
(as 
= 
a 


Figure 2.2. Snapshots 


-27- 
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Figure 2.3. Firing Rules 


Firing Rules 

A typical actor is enabled by presence of a token on each input arc - with the exception 
of a merge. The firing of an actor absorbs tokens from its input arcs and places a token on 
each of the selected output arcs. The values of the output tokens are-functionally related to the 
values of the input tokens. A link simply replicates the ioe received and distributes it to the 
destination actors - actors to which an output arc is connected. The effect of the firing of an 
operator is to apply to the inputs VI,...¥m the function associated with the operation name 
_ written inside.the operator to yield the outputs Ul,..Un. We generally require that fabels be 
used to identify the type of the values carried ‘by each arc, but will omit them when their types 
are clear from the context. The switch and merge are used for controtling the flow of tokens. A 
switch requires a data input and a control input of $oolean value from the set {true, false}. The 


firing of a switch replicates the input token on one of the output arcs according to the boolean 
control value. The arrival of a token on either input arc enables a merge, and upon firing a 
token of the same value is placed upon the output. The behavior of a merge Is inherently 
nondeterminate when two input tokens reside on the tpt arcs; neither token ts lost, but the 
- firing rule does not specify in which order the output tokens will be Genernved ! A sink absorbs 
the input tokens upon firing and places a special token signal on the output arc. The purpose 
of a sink actor is to absorb unwanted values; the signal output token is necessary for the 
implementation of schema application is described in Chapter 4. . | 
The set of functions commonly associated with - Operator actor includes the scalar 


arithmetic operations and constant functions. 


Il. We choose the merge instead of the determinate merge of [DenFo73), because in recursive 
data flow schemas the chosen nondeterminate merge can safely replace the determinate merge 
and its use results in less complicated graphs. 


2.2 Well formed data flow schemas 

Unrestricted use of switch and merge is undesirable since arbitrary connection of these 
actors may form schemas which deadlock or are nondeterminate. Because these properties are 
undesirable for reliable programming, we choose a subclass of such schemas which will satisfy 
the needs of programming. | 

An (m, n) well formed data flow schema is an (m, n) data flow schema formed by any 
acyclic composition of component data flow schemas, where each component is either a link, a 
sink, an operator, or a conditional subschema. The structure of a conditional subschema is 
shown in Figure 2.4, where the heavily darkened arcs are labeled by letters denoting the 
number of arcs they represent. If P is an (m, n) subschema, Q is an (m, n) subschema and D is 
an (k, 1) subschema whose output is of type boolean, then the conditional subschema is an (m, n) 
subschema. Constructing a conditional-schema from subschemas of different arity can be done 


by patching sink actors within each subschema. 


2.3 Apply actors 

The class of well formed data flow schemas.as defined cannot express program features 
such as procedures, procedure applications, and iteratitins: We introduce an operator apply 
whose symbol is shown in Figure 25. The first input to an apply actor is a token carrying a 
name uniquely associated with an (m, n) well formed data flow schema which may also contain 
apply actors. An apply actor is enabled when a token resides on each input arc! The effect of 


firing an apply operator is to modify the snapshot by replacing the actor with the (m, n) schema 


1. This enabling condition is actually a very restricted form of procedure application, and does 
not satisfy some requirements of models which have the property of referential transparency 
[Stoy77]. Furthermore, this form of firing rule reduces asynchrony of the computation. We will 
discuss this in greater detail in Section 2.5. 


Figure 2.4. A Conditional Schema 


(a) Notation for apply 


uy u 


(b) Firing Rule 
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Figure 2.5. The Apply Actor 


designated by the name as shown in Figure 25. This replacement connects the input arcs 
carrying values Vj.- ++» Vn to the m input link’s of the schema and the n output link’s to the 
output arcs Uj,..., U,, of the apply actor. Notice that the symbol of an apply operator allows 
one to define a data flow schema which involves recursive applications of the same schema by 
naming each data flow schemas. In this model, then, there is a global n name space in which all 
schemas are defined a unique name. 

An example of the use of apply actors is shown in n Figure 28. Iisa (3, 2) schema that 
is recursively defined, and computes the factorial of an integer greater than one.. The first link 
actor labeled trigger is an input link whose function is to trigger constant actors for generating 
constants. The second link labeled f is for carrying the name of the procedure to the first input 
of the apply actor that uses.the name té:treate another instance of tiajsame scheme. The merge 
actor labeled signal is to allow a proper construction of a conditional ete! tha may contain 
subschemas which uses sink actors. (Notice that the apply acter has a special output arc which 


carries a signal value. This is a convention that we have adopted and. con ‘be optimized in 
many situations.) 

We have not included the class of data flow schemas which corresponds to language 
constructs such as while loops in Algol 60 or Do statements in Fortran or PL/L ‘Such data flow 


schemas (DenFo73] are constructed by cyclic connections of data flow actors, thus, the firing rule 
of actors that require the output arcs to be empty for their enabling must be observed. To 
implement this firing rule faithfully would require each actor te receive an acknowledge signal 
from each of its destination actors in addition to input tokens! In addition, the merge actor 
must be a determinate merge actor [DenFo73] which requires a control input to determine which 

input tokens to be passed to the output arc. The use of acknowledge signals, however, can be 


1 We refer the reader to [Misun75) for an example illustrating this point. 
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trigger f x 


f : (3, 2) schema 


signal \/ \/ result 
| 2 


Figure 2.6. Recursive data flow schema for factorial 
f(x) = if x =<1 then x else xof(x-1) end 


eliminated when the schemas are free of cyclic connections. This has the advantage that the 
firing of each actor is not delayed by waiting. for acknowledge signals from its destinations. 
Furthermore, there is no need to encode into instructions the information required for returning 
acknowledge: signals. This leads to a simpler mechanism for implementing procedure 
activations if the class of data flow schema is restricted only to atyeme schemas. 

For these reasons, we choose. to , implement these language features in their equivalent 
form of self recursive application of data flow schemas. This has the desirable property that, 
without any compile time analysis, the mechanisin for procedure activation allows simultaneous 
execution of different instances of the data flow schema which correspond to different iterations 
of a While-loop. | . 


2.4 Data structures 

In this thesis, we are interested in an interpretation of data flow schemas which 
requires the types integer, real, boolean, character_string! and structure, We will assume that 
the set of operations defined on the data types other than data structures is well understood. 
We now define notation for data structures and the set of allowable operations. (The material 
presented here is based on [Denns72, Ellis74]) 

The strict definition of the semantics of data structures must include all data flow 
actors which have at least one input or output arc for carrying data structures. Thus, the set of 


actors would include link, switch, merge, sink, and operator. The function of switch and merge 


1. We restrict ourselves to character_string of bounded length which can be treated as a scalar 
value. For character string of variable length, the implementation will be quite different. 
Furthermore, if selector names of a data structute operation is of variable length, 


implementation of data structure operations depend on how variable fength character string | is 
implemented. 


is purely for controlling the flow of values and is naturally extendable to data structures. The 
function of create, append, select, link and sink determines the number of the instances of data 
structure values that exists in the system. These actors, therefore, are related to the function of 
resource management of storage for data structures. Semantically, the function of the link and 
sink actors are the same as defined previously. The primary type of actors that we define here 
will be the class of operators which perform operations on data structures. 

A data structure can be either a nil structure which has no component or a structure 
having n component structures di,..dn whose selector names are respectively sl,...sn as shown in 
Figure 2.7(a). The selectors are either character strings or integers and each selector name must 
be different from all others in the same data structure. Furthermore, these selectors are assumed 
to be ordered lexicographically. An alternative linear notation for the structure is 

(si: dl,...,sn: dn). 

The set of data structure operations are defined below, where d and d’ are data structures, s is a 
selector name, and c is an object of any type: 
(1) create () 
The create operation causes a nil data structure to be returned as the result. (Figure 
2.2(bXI)) 
(2) append (d, s, c) 
The operation returns a data structure d’ which is identical to d except that the s 
component is c regardless of whether d already has a component with the selector 
name s. (Figure 2.7(bX2)) 
| (3) delete (d, s) 


The result of the operation is a data structure d’ which does not have an s 
component. (Figure 2.7(bX3)) 


” (4) select (d, s) 
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Figure 2.7(a). A Data Structure 
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{t) create 


Figure 2.7(b) Effects of data structure operations 


If d has an s component, the result is the object c associated with that component. 
Otherwise, the result is the value undefined. (Figure 2:KbX4)) 
(5) nil_structure (d) 
This is a predicate whose value is true if d is nil, otherwise its value is false. 
Examples of the effect of these operations are illustrated in Figure 2.X(c). Notice that the effects 
Y ; 
delete (d,s), and 
append (d, s, nil) 
are different, since the the delete operation would remove the triple (s, @’) while the append 
Operation would replace-it with (s, nil). In general, it is. possible to distinguish between these 
two data structures using the select operation, since it returns the nil structure for one while 
returning undefined for the other. It should be mentioned that an array is simply a data 
structure whose selector nine are all integers. . 
The set of operations together with the link actors and sink actors provides a complete 
set of operations on data structures! These operations allows one to create dynamic data 
structures of arbitrary size as opposed to data structures which are declared to be of fixed 
structure and mapped into linear representations at the compile time. The function of storage 
allocation: for the data structure operations is implicit in these operations, while conventional 
programming languages which allow this form of dynamic data areca often use explicit 
storage allocation primitives. _ 
This form of data structures can represent sparse arrays in a very efficient manner. 
Since selector names can be character strings, it is possible to implement algorithms on data 


structures without having to explicitly encode the character strings into other forms such as 


I. Complete in the sense that the set of data structures is closed under the operations. 
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Figure 2.2(c) Effects of data structure operations 


integers used as subscripts into an array. Thus, the user need not be aware of the particular 
structure of the internal representation. The semantics of the data structure operations defined 
above is free of side-effects, because a data structure operation aitways prejjuces a new data 
structure without modifying data structures used in other parts of, the:schema. Thus, the 
computation is free of side-effects. We feel these aft properties that ease prograiwmdng tasks. 

The Implementation of data stractures can be based on the nation of items. An item ts 
a storage mode that ty amociated with a unique Mentifier (uid) and can store a set of tuples of 
the form (3, ¢) where c is ether a uid of another item er a simple scalar valve and s is the 
selector of the component. Thus, a data structure is represented boy a collection of items. in the 
definition of data structure operations (except the gipdicate nil structure), eact eperation on a 
data structure semantically creates a new data structure representing the result of the operation. 
In this implementation of data structures, the result of a yalect operation ts either a scalar value 
or simply the uid of the selected component. The implementation of the append operation, 
however, must maintain the side-effect free property of these operations. The resak of 

append. (a, 5A) — 7 z . 

| Is the uid of an item containing a new set of tuples which differs from a onty in the tupte (s, 2). 
Using item, an efficient implementation can be defined [Denna since it only requires creating 
a new set of tuples and does Hot copy the entire sets of tuples of the subcomponents. Thus, the 
implementation allows many component structures to be shared physically white maintaining 
the side-effect free neture ofthe operations | 

There are many implementation considerations thet affect the efficiency of these data 
structure operations. | 

First, we must provide mechanisms for resource management. These mechanisms must 
alfocate items and must determine when the storage can be recheimed. The fatter must be- 
dependent on the behavior of the program and on how data structure operations may. provide 


waits 


additional information for the resource manager. In traditional systems using dynamic resource 
allocation and automatic resource management, this information is obtained by maintaining a 
root node from which all nodes accessible by the computation are traceable. 
| We choose a different approach to the garbage collection problem. This approach is 
possible only because the naan of the data structure operations allows an implementation 
that always produces an acyclic structure of items. For each item we include a reference count 
which indicates how many references (instances of its uid in the system) to that item exist. Each 
data structure operation modifies the reference count of the items. The set of operations that 
affect the ichererics ie must include all actors which carry tokens carrying data structure 
values, for example, the link actor which copies the uid of a data structure must increment the 
reference count of the item, and the sink actor must decrement its reference count. Thus, there is 
an overhead associated with each data structure.operation for maintaining the reference counts! 
The other concern is that of the size of the node’ for storing the tuples. Since the 
allocation of a variable size node is quite difficult, we have only. seen proposals that use fixed 
size nodes. This restriction raises the problem of how to represent a variable size node with 
fixed size nodes. An approach is to require that selectors. have the property that each can be 
considered as a sequence of symbols from a fixed size alphabet. Then a variable node might be 
implemented as a tree of fixed size nodes such that each path from a root node to a leaf node 


represents a selector name. We refer readers to further readings [Rumba75, Acker77] on this 


1. It has been argued that the overhead associated with reference count storage management 
scheme may be higher than that of garbage collection schemes on cyclic structures. This 
inefficiency argument against the reference count scheme is not valid when we adopt a scheme 
called split reference count: a uid to a data strocture is conceptually a tuple (uid, 
reference_weight), a link of two output arcs that receives (a, n) fires by producing two tokens 
carrying (a, nj) and (a, no) such that n =n) + No. We should mention that this is an alternative 
form of managing items and its feasibility needs further investigation. 
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subject. 

Another important characteristic of the operations is that the form of data structures 
created using these operations is always an acyclic graph. This is quite different from 
conventional programming fanguages which allow one to create arbitrary structures constructed 
by manipulating pointers. We have explicitly disallowed such operations for several reasons. 

The creation of cycles is-a programming technique which has proved effective in 
sequential programming. It is not clear, however, that such techniques are suitable in a 
programming language which does not allow side-effects. The programming technique can 
indeed be simulated in a language not having cyclic structures by introducing procedures which 
interpret the acyclic counter part of the cyclic structure. It -is. desirable that we can provide a 

comparison of programming task of the two approaches, Unfortunately, we have neither seen 
nor found good cases against or for either approach. While we do know that semantics based 
on immutable cycles is a possible approach [Hende?5] it remains to be shown that cycles are 
indeed an essential form of data structures. 

The other reason for disallowing cycles is based on a resource management argument. 
For systems such as.the LISP interpreter, the existence of cyclic data structures results in the 
need for garbage collection schemes which mark all of the accessible data structures and 
deallocate those that are left unmarked. This has the undesirable effect that a computation is 
interrupted during the process of garbage collection. Some recent works [Baker78, Bisho77] 
have reduced this effect by introducing garbage collection schemes which allow computations to 
be running concurrently during the garbage collection. In a system which does not create cyclic 

structures, the garbage collection scheme.can be based on reference counts and need not resort 
| to the elaborate schemes that have been developed. | 
In this thesis, we hive restricted ourselves to acyclic data structures because the 


implementation of procedure activations and streams are orthogonal to this issue. Therefore, 
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we leave this as an area that can be investigated by others. 


2.5 Discussion 

The apply actor presented in Section 2.2 requires that all input values to be present on 
the input arcs to become enabled. This has two implications. 

First, the seiguage definable based on the apply actor must define its semantics based 
on “call by value", that is, a procedure (or, interchangeably, schema) application is well defined 
only for the case when the computations producing inputs to the procedure terminate. This can 
be contrasted with the more general form of procedure applications which allows a procedure 
application to take place even when the computation of some of the inputs. does not terminate. 
The more general: form of procedure application fas a desirable semantic property which is 
often referred to as referential transparency or the property of substitution [Stoy77). Let f and g 
be two procedures such that f appears as an application inside ef:g, and let g’ be the procedure 
obtained from g by substituting the text of f in place ef the application. In the tanguage that is 
referentially transparent the specification of the functional for g will not depend on any 
specification of the termination property of f, thus, the functional for g and g’ will be 
equivalent. In the language whose procedure applications must depend on the termination of 

_the procedure f, the procedure g and g’ would be of different functionats. This is because the 
substituted procedure allows the computation to proceed without waiting for all inputs to be 
available. The difficulty with designing a system which supports a referentially transparent 
language is that it needs mechanisms that detect when the resuk of a subcomputation is not 
required for further computation and prevent the nonterminating: subcomputation from wasting 
computing resources. 

The second implication is if the operation of the apply actor is implemented in a 


straightforward manner, the degree of synchrony of the computation is reduced. Because, in 


most cases, there are parts of the computation that can proceed as soon as some of the input 
values become available and need not be constrained to wait for the arrival of other inputs. 
For a referentially transparent language, this asynchrony is achievable, while for the language 
with call-by-value semantics this asynchrony is constrained unless one knows that all 
computations terminate. | 

A consequence of side-effect-free data structure operations is that some operations 
which seems rather simple to perform in existing languages become more complicated. 

Consider a data structure A from which a data structure A’ is to be constructed which is 
identical to A except for the component | | 

select{ select{ A,."a"), “c’). 

To construct A’, RE ene eee mene ome ne aoe 
append( A, “a”, ( 
append select( A, “a")), “c”,.C) ). | . 
Thus, from the-criteria of ease of expression, some additional higher level operations.need to be 
defined. 

There are many issues that require further study to understand fully the implication of 
the side-effect free semantics. We have already touched briefly on the issues.on cyclic structures. 
Another interesting issue relates to the computational complexity of many algorithms that have 
been found to be efficient but-have not been-shown to. be equaily.efficient using side-effect. free 
operations. Examples of such algorithms are heap sort. and merge sort [AhHoU75] Thus, the 
criteria for choosing appropriate algorithms for applications may be significantly different 
depending on whether modifications are allowed on existing data structures, Still another area 
is the semantics of jsniiciaihats computations. 


- 45- 


Chapter 3. A Textual Language 


In this chapter we introduce a programming language based on the model of data flow 
schemas described in Chapter 2. The language departs from conventional sequential languages 
in many ways. We have removed the notion of sequential control flow of a computation by 
introducing value-oriented semantics. There are no explicit language primitives for introducing 
parallelism. The concurrency of a computation is determined by the data dependency within 
the program rather than by explicit creation of concurrent processes. While it is possible that - 
compile time analysis can be performed on sequential programs to produce an equivalent 
program of greater concurrency, this does not help programmers to express computations in a 
form which exhibits high level of concurrency. Furthermore, no compite time analysis has been 
able to extract the inherent concurrency from a program containing unnecessary constraints 
which are the result of language features based on the assumption of sequential computer 
organization. 

The language does not have the notion of memory locations or variables commonly 
found in conventional sequential programming languages; instead we introduce the notion of 
naming for identifying a value in a computation in very much the same way mathematical | 
notations would use names. With the value oriented semantics, we expect programs now can 
exhibit the inherent concurrency of an algorithm, and may even provide additional motivation 


for designing new algorithms of greater concurrency. 


3.1 A value-oriented language 
The language is value-oriented in the sense that each syntactic unit corresponds to a 
function whose evaluation produce a set of values. The computation associated with a syntactic 


unit called an expression does not interact with the computation of other expressions in the 


program. While the purest form of value-oriented language does not use names for defining 
values, we introduce names for defining procedures and for convenience of programming since 
naming is a useful mechanism for identifying values of expressions. 


In this thesis we will not be concerned about many language design issues that arise in 
making the syntax and the semantics of the language rich enough for a user to program in! 
The language is intended only to demonstrate the existence of a reasonable syntax and to 
facilitate the discussions in later chapters. The set. of data types consists of integer, real, 
boolean, character string, and structure. We shall call these data types simple data types. The 
set of operations defined on integer, real, boolean, and character string are the usual operations 
seen in many languages. The operations on structure are the set of data structure operations 

given in Chapter 2. | 
The syntax of the language is given in Figure 3.1. A procedure definition consists of a 


list of procedure definitions followed by an expression. A procedure definition is of the form: 


P= = procedure ( a;:T}.... am: T py) Yields ( RyRy) 
.. a list of procedure definitions .. 
sexpression>; 


end P; 
| This defines a procedure P that requires m input values a),..a,, of types T,-.T,, respectively. 
The names aj,..a,, must be distinct and can appear in <expression>. The evaluation of the 


|. The language described here can be regarded as a subset a. the carne called VAL in 
development at MIT [AckDe78] 
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Notation: {<E>}* means <E>|<E>,{<E>}’ 
{<E>} means <empty> 1{< E>}* 


< program > := program { < procedure def > } < expression > end 

. < procedure def >::= < name > « procedure ( < input list > ) 
yields < output list >; 
{ < procedure def >}; < expression > 
end < name > 

< input list > := { < type declaration > } 

< type declaration > = < name > : < type > 

< output list >= { < type >} : 


< expression > == < primitive expression > 

| { < expression > }* 

| < let-block expression > 

| < conditional expression > 

| < application > 
< let-block expression > ::= 

let { < type declaration >}; { < name def > }; in < expression > end 
< name def > ::= { < mame > } = < expression > 

| < name def >; { < name > } = < expression > 

| < empty > 


< conditional expression > ::= 
if < expression > then < expression > else < expression > end. 


< application > «= < name > ( < expression > ) 
< primitive expression > ::= 
< expression > < primitive operation > < expression > 
| < primitive operation > ( < expression > ) 
~[<name> |< constant > 
< simple data type > «= integer | real | boolean | character-string | structure 
< type > = < simple data type > | stream of < simple data type > 


Figure 3.1. Syntax of a value-oriented language 
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procedure yields an ordered set of output values of types Ry.Rp, resulting from the evaluation 
of <expression>. ‘While each procedure in the list of procedure definitions may itself contain 
procedure definitions, we adopt for simplicity the scope rule that all proceaure names are 
globally defined ~ that is, no two procedures can have the same name in the entire program. 

An expression has several attributes: arity and ordering. Each expression yields an 
ordered sequence of values. The arity of an saan 4s.the size of the sequence of values it 
yields. We give a recursive definition of the arity, A(E), of eath of the six types of expressions 


as follows: 


A( <primitive aaah =I, 
A( <eXp)>, . .. .<exp,>) = A( <exp)> )+.. + AC <exp,> aa 
A( <let-block expression> ) = A( let <definitions> in “> end) 
= A( <exp> ), 
A( <conditional expression> ) = A( if <exp> then then: <exp,> the expp> end end ) 
= A( <exp,> ) (and must equal AC <expy> » 
A( <application> ) = the number of results listed in the. net clause of the procedure 
definition. 


For a procedure to be well defined the arity of the expression of a procedure must match the 
number of result types declared in the yields clause. Names appearing in an expression must be 
defined either in the input list of the procedure or. be-procedure names. | 

In many situations it is convenient to ATOR Or a name for an expression because it is 
a common subexpression of a larger expression or ‘because it is necessary to buikd a new 
expression whose values are permutations of another.” The Iet-biock @xpression is used for 
introducing names each standing for an ‘expression of arity « one. A Iet-block expression, is of the 


form: 
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let { <type declaration> } 


<name-listy> = <exp)>; 
<name-list, > = <exp, >; 


in <exp> end; 


Where the names in <type declaration> of a let block are temporary names meaningful only 
_ within the block, and any reference to these names outside of the block is not defined. These 
names must be distinct from each other and may appear in the expressions <exp)>,....<exp, >, 
and <exp>. Since they may conflict with names for inputs of the procedure or names defined in 
outer let-blocks the scope rule is that innermost definitions take precedence over the outer 
definitions. Type declaration of names is in the form: 

name : type;,..., name, : type; 
where typey,.., type, are one of the allowable types. 

We require that the number of names in a name-list be equal to the arity of the 
expression on the right side of the equality sign. The value of a name in a name-list is the 
value of the corresponding expression appearing on the right hand side of the equal sign, and 
the value must be of the type specified by the type declaration of the name. The vatue of a 
let-block expression is the value of <exp> enclosed by in and end. 

A conditional expression is of the form: 

if <exp,> then <expo> else <exp3> end; 
The expression <exp)> must be a boolean value of arity one. The expressions <expo> and 
<exp> must have the same arity and the corresponding value in each expression must be of 
the same type. The value of a conditional expression is: <expo> if <exp,> evaluates to the 


boolean value true; <exp3> if <exp,> evaluates to false; otherwise, undefined! 
A procedure application expression is of the form: . 
P( <exp> ); 
where the arity of the expression <exp> is the number of input values required by procedure P 
and the type of each value must match that of the input 5s fialarantes The resuk of the 
procedure application is a sequence of values of size and types specified by the yield clause of 
the procedure heading. 

A primitive expression is an expression that uses the set of primitive operations 
defined on the data types. For historical. reasons we introduce two. forms of primitive 
expressions: infix and prefix. An infix expression is of a form: | | 

<eXp;>operation <expo>; 
where the operation must be a binary operation, and <exp,> and <expo> must be of arity one 
and of compatible type with the operation. A prefix primitive expression.is of the form: 
operation ( <exp> ), 
where the expression must be of arity and type compatible with the: eae 


1. We will assume that, most data flow actors produce the value undefined, if some required 
input value is undefined. 
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An example 


We give a procedure that defines a parallel factorial computation below: 


Factorial = procedure (n : integer ) yields integer; 


Product = procedure ( nj : integer, no : integer ) yields integer; 
if np>=no then ny 

else let middie : integer; 
middle = (nj + no ) quotient 2; 
/s this is an integer division «/ 
in Product( nj, middie ) « Product( middlesl, no ) end 

end 

end Product; 


if n < 0 then error else Product(l, n) end; 
end Factorial; 


3.2 Correspondence between the language and data flow schemas 

A procedure of m inputs and n outputs corresponds to an (mel, n+l) data flow schema. 
The m input links corresponds to the m inputs of the procedure. The data flow schema has an 
additional input link called the trigger link whose purpose is to send trigger values to constant 
actors in the schema. The additional output link is for passing signal values from sink actors. 
As a convention, we require a trigger input link and the signal output link be there whether 
constant actors and sink actors are used in the procedure or not. Internal actors of the data 
flow schema evaluate the expression of the procedure. 

The translation of a program in the language into data flow schemas is quite simple 
due to the value-oriented semantics of the language. We give an informal and recursive 
translation procedure below. In this translation procedure each expression is translated into an 


(m, n) schema S whose input links are labeled by names. We shall use the notation In(S) to 
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denote the set of names used as labels for the input links of the schema S. The notation Sire{ « 
) defines the number of distinct names in the set a; (a u @) defines the union of two sets a and (; 


(a - 6) defines the set that contains the elements in a which are not in 6. 


Given a procedure P, it contains a set of procedure definitions { P, } and an expression E. 


(1) Translate each procedure P, into an (m,, n,) schema, and add the name P, to the global 
name space of the program. Since procedure names are uniquely defined , there is no 


conflict of names in the name space. 


(2) The translation of an expression is defined by cases according to'the syntactic structure 
of the expression. 
(a) E = < primitive expression > 
If E is a name, then it is translated into a single link actor labeled by that name. If 
E is a constant, it is translated into a constant actor whose input arc is connected 
from a link actor labeled trigger and whose output arc is-connected to a link. If E is 
- a primitive expression, 
<primitive operator> ( E; ), 
then the resulting schema S for E is an (m, n) schema where m= my assuming E, ts 
translated into an (mj, nj) schema S). The connection between the input arcs of the 
| primitive operator and the output links of schema S, is impiteitly defined by the 
ordering of the expression E, as thown in’ Figuie 22(a)-'The input links of S, 
become the input links of S. The output arcs of the primitive operator.are connected 
to the output links of S and an extra output link is created and labeled signal if the 
schema S; contains an output link labeled signal. Thus, n is either equal to the 
output arity of the primitive operator or is larger than it Koya one. 


(b) E+ Ey, .., Ey 
Transfate E,, ..., Ey into S, .. , S,, where each S; ts atr(m,, n,) schema. The schema 


(a) E = < primitive expression > 


E = < name > E = < constant > E = < primitive operation > ( E; ) 


trigger 


@ name Ss 


signal 


Figure 3.2{a), (b) Translation Rules 


S is an (m, n) schema such that 

m = Size( In(S,) u... u In(S,) ) 

nN = No = the sum of n,, fori = 1, .., k, Soo eee ents oS are 

labeled signal; otherwise 

n = 1419 - (the number of output: links labeled signal). 
The construction of S from S;'s is by connecting the set of m input links to the input 
links of each S; according to the labels of their: input links and by connecting all 
output links of S,, ... Sy to the n output links in the order defined by the expression 
such that all output links labeled signal are connected to the only output mk of S 
labeled signal. ( sakes to Figure 3.2(b).) 


(c) E = let T, N result Ep end 
The type definition T only provides information for compile time type checking; N. is 
the list of name definitions containing k names; and Ep is an expression. The 
translation of expressions in N yields an (m,n) schema S, where ny = k or kel 
depending on the existence of an output link labeled signal. These k output links 
are labeled with names in N according to the definition. The translation of Ep yields 
an (mg, No) schema So. . 
The (m, n) schema S is constructed by cascading S; and Sp such that the set of 

input links in Sp labeled with the names in N are connected to the output links of S). 
The set of m input links are labeled with names in the set (In(S,) u (In(So) - N)) and 
are connected to input links of S; and Sp according to the labels. The output links | 
of S includes the np output links of Sp and may contain an output link labeled signal 
if one of the following three conditions istrue, 

(i) Sp or S; contains an output link labeled ‘signal. In this case simply connects 

all such output links to that of S. 

(ii) The set (N - In(Sp)) is not empty. This implies that the set of names 

defined in N are not all used in the expression Ep, and, therefore, must be 

discarded using sink actors which are then connected to the output arc labeled 


signal. 


The resulting schema is shown in Figure 3.2(c). 


(d) E = if E, then Eo else Ey end 

Let Sj. So, and Sq be (mj, 7), (mo, No), and (m3, ny) schemas translated from E), Eo, 
and Ey respectively. For a well formed conditional statement, note that No differs 
from ng at most by one. The S is an (m, n) conditional schema such that m = Size 
In(S}) u In(Sp) u In(S3) ). This conditional schema contains m’ switch actors, where 


m’ = Size( In(So) u In(S3) ). (Notice that m’ may be less than my becuase some inputs 
are used only in the predicate of the conditional schema.) It contains ng merge 
actors, where ny = maximum(n, No). The true branch. of the conditional schema is 
obtained by modifying So by adding additional sink actors if m’ > mo; the false 
branch is similarly constructed. This construction results in a schema S shown in 
Figure 3.2(d). 


(e) E = < procedure application > = P ( E, ) 

Let P be the name of a procedure which is defined to have m input values and 
yields n output values. The translation of the expression Ej produces an (mj, ny) 
schema. The schema S for E is constructed using a constant actor of value “P" and 
an apply actor of m+2 inputs and n+l outputs as shown in Figure 3.2(e). The apply 
actor requires m+2 inputs because the first input is for the name of the procedure 
and the msl inputs and nsf outputs are for the (msl, n+l) schema translated from the 


procedure P. 


(3) The application of the translation rule to the expression E yields an (m’sk, n’) schema 
_ S, where m’ = m, or msl and n’ = n, or nel, if the procedure P is defined to have m input 
values and n output values. The extra k input arcs are due to the procedure names used 
in the expression E, and m’ and n’ depend on whether a trigger input link and a signal 
output link is produced during the translation. We obtain the final (msi, nel) schema for 
P by adding constant actors whose values are k procedure names and by adding a trigger 


(c) E=let T,N in Ep end 


forS, for S, USp for So 
ro 


trigger 


Figure 3X(c) Translation rules 
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(d) E=if E; then Eo else E; end 


trigger 


Figure 3.2 (d) Translation rutes 


trigger 


signal 


signal 


Figure 3.2(e) Translation rules 


and a signal link if necessary. 


This concludes the translation procedure. The result of the translation procedure on the 


" procedure definition for computing the factorial of an integer is shown in Figure 3.3. 


3.3 Discussion 

We have not introduced data type declarations for arrays or records. It is desirable to 
introduce additional declaration mechanisms for defining data structures of specific forms such 
as array, record and union types, because such declarations provide effective compile time 
checking which would otherwise be costly at execution time of a program. These are regarded 
as extensions not of our primary concern in this thesis. 

The implementation of procedures as values (or, procedure-values) is a very subtie 
issue that involves both the representation of procedures and the manner in which procedural | 
values are used. In this simple language, we have only allowed application of procedures that 
are defined at compile time. The use of a global name space for procedure names is overly 
restrictive in that there are many situations where definitions of local procedures are desirable 
without regard to use of names. The use of a global name space also violates principles of 
programming methodology which emphasize the importance of modular program structures 
and language structures which guard against the propagation of. unintended or malicious 
side-effects. 

In a more general programming language, we would like to be able to dynamically 
create proceaurs by compiling a procedure definition or by combining existing procedures to 
yield another procedure whose function is the result of composition of others. To implement 
these operations on procedure-values in an operation model that is free of side-effects presents 


several problems. 


F : (2, 2) schema 


trigger 


Figure 3.3. A example of translation rules on the procedure F: 
F = procedure { x : integer ) yiek’s integer 
if x=<l then x dse’x ¢F(x-1) end 


end F 


The creation of procedures cannot simply cause updates to the global name space, since 
this would create side-effects for the processes having references to it. Another problem relates 
to the construction of recursive procedure definitions. In Henderson's binding model [Hende75], 
the construction of recursive procedures is cast in an operational model that allows data 
structures containing cycles. In the language presented here, we have been able to allow 
recursive definition of procedures by introducing a global name space such that no cycles are 
created. While it is possible to extend this scheme for constructing recursive procedures 
dynamically, it seems premature to define any implementation of procedural values without 


further conclusions regarding the desirability of data structures containing cycles. 
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Chapter 4. Implementation of Data Flow Schemas in a Data Flow 


Processor 


The data flow schema model presented in Chapter 2 is based on the graphical 
representation and a data flow interpreter that implements its operational semantics. In this 
chapter we present the structure of the data flow processor and an implementation of the 
interpreter. Section 4.1 introduces the structure of the data flow processor, and the remaining 
sections describes the representation of a schema as a data structure and that of an activation of 
a schema. In Section 4.3, we present additional modifications on data flow schemas for 


implementing the semantics of procedure activations. 


4.1 Data Flow Processor 

' The structure of a data flow processor for supporting the execution of recursive data 
flow schemas is shown Figure 4.1 It consists of six subsystems: Functional Units, Structure 
Controller, Execution Controller, the Arbitration and Distribution Networks and the Packet 
Memory. The processor is based on a packet communication design principle that has been 
advocated by Dennis (Denns75]. The arcs between subsystems represent channels through 
which packets of the specified types are sent. Two major subsystems of interest to us are the 

Packet Memory and the Execution Controller. | 
: The Packet Memory holds data structures as collections of storage nodes, called items, 
each of which represents a tuple of a one-level-data structure. An item may have scalar values 
and unique identifiers of other items as its components each identified by its selector. Thus, a 
collection of items can represent an acyclic directed graph where each arc corresponds to a | 
unique identifier component of the item representing its origin node. The Packet Memory 


maintains a reference count for each item and reclaims physical storage space when items 


data structure 


result ‘| (Data Structure) 
packet 
Packet Memory 
| (Procedure Structure) 
instruction fetch 
command 
Distribution 
Network 
activation record 
result (Activation Record) 
packet 


Packet Memory 


| Figure 4.1. Data Flow Processor 


become inaccessible. 
Structures held in the Packet Memory have three roles in the execution of data flow 
schemas: | 

(1) as operands for the data structure operations implemented by the Structure 
Controller; 
(2) as procedure Structures that represent data flow graphs and have as components 

_ instructions of a data flow procedure which are encodings of actors and their output 
arcs in a data flow schema; and. 
(3) as activation records which hold operand values, i.e. tokens arrived at an actor, for 
each actor instance while waiting for their enabling condition to be satisfied. 

The concept of a Packet Memory System was introduced by Dennis, and the design 
issues for these systems and the Structure Controller have been studied [Denns75, Acker76] In 
Chapter 6, we discuss in greater detail the properties of the Packet Memory that must be 
Satisfied to support these structures effectively. 

The Execution Controller fetches instructions from a procedure structure and operands 
from an activation record that are stored in the Packet Memory and forms them into operation 
packets. Each operation packet is passed to the Arbitration Network for transmission to an 
appropriate Functional Unit if a scalar operation is called for, or to the Structure Controller for 
the data structure operations. Instruction execution in the Structure Controller and Functional 
Units generates result packets which are sent through the Distribution Network to the 
Execution Controller where they will join with other operands to activate their target 
instruction. 


The Arbitration and Distribution networks are both store and forward networks and 


can forward a packet from any one of the input ports to any one of the output ports! It is 
important to realize that the delay of packet traversal through the networks is subject to 
vattations due to the resolution of contention for buffers among packets in the networks. Thus, 
the Execution Controller has to store the result packets as operands aid detect the enabling 
configuration of an actor regardless of the order of arrival of these packets. That this can be 
invplerientedeatréctly will be seen later when we give detailed representations of procedure 
structures and activation records. 

Although the Execution Controller, Structure Controller and the Packet Memory are 
shown wn Figure 41 as single units, each is in. fact.a collection. of many identical units. For 
example, the Packet Memory subsystem would ‘consist of separate systems; each hokding all items 
whose unique identifiers belong to a welt defined partition ‘of the: atldress- space of unique 
identifiers. The Execution Controller subsystem: consists of ‘identical modules each of which 
would serve a distinct subset of procedure activations. 


4.2 Procedure Structures and Activation Records 

This section’ Eien several alternatives to the representation of procedure structures 
and activation records. Section: 4.21 presents a simple representation and may incur unnecessary 
delays in instruction execution. Section 4.22 gives two other akernatives. In the rest of the 


thesis, however, we will assume that the simple representation presented in Section 4.21 is used. 


L We refer feaders to [Bough78] for further readings on a possible approach to the design of 
such networks. 
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4.2.1 Procedure Structures and Activation Records 
| A data flow schema is represented in the machine by a kind of data structure called a 
procedure structure. A procedure structure corresponding to a data flow schema of n actors is a 
data structure having n components with integer selector names from | to n assigned to the 
actors. Each component, called an instruction, is an encoding of an actor and its output arcs. 

An actor having n output arcs is encoded as a data structure shown in Figure 4.2. We 
shall call the components fields of an instruction. The Operation field defines the function 
performed by the actor, the destination fields DI, .., Dn define n output arcs. Each destination 
field has three subcomponents: the Inst component is the integer selector name of the 
destination; the Input-Arc component is an integer designation of an input arc of the 
destination; and the count component is the number of result packets expected by the 
destination. 

Since multiple instances of the same schema may be concurrently active in a 
computation, each activation (an instance of a procedure execution) is represented as a separate 
activation record whose representation is shown in Figure 4.3. Each actor in an activation is 
uniquely identified by the tuple (A, i), where A is a uid of the root node for the activation 
record and i is the integer assigned to the actor in the procedure structure. A token of value v 
on the k-th input arc of an actor (A, i) corresponds to a result packet that carries the 
information (A, i, k, v, count), where count indicates the number of tokens (or operands) 
required for the enabling of the actor. 

An actor is enabled when the number of result packets having arrived at the operand 


record -- the i component of the activation record A -- is the same as the count in the result 
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Figure 4.2. Procedure Structures 
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(a) activation record 
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Figure 4.3. Activation Records 


packet. I The detection of enabling is a function of the Execution Controller that processes 
activation records. Upon enabling of an actor instance (A, i), the instruction of the actor is 
fetched from the i component of the procedure structure. . 

An activation record shown in Figure 43 has ‘components with integer selectors for 
operand records and an additional “text” component that is the procedure ‘structure for the 
activation. (In our implementation, this component may be shared by other activations of the 
same schema.) An operand record sap ihane as many integer subcomponents as input arcs of 
an actor, and also contains an “arrived” subcomponent indicating | the womber of arrived result 
packets. Since an activation record stores values of arrived result packets in its components, 
operations on an activation record modify its components. The operations on activation records 
are defined below: 

(1) create-activation( P ) . 
This returns a new activation record with P as its “text” component and with no 
other components. 

(2) insert A,s,¥) 
The operation adds to A ans component with value v. The selector s is of the 
compound form ik where k denotes the k-th input arc of the instruction i. The 
operation increments the i."arrived" component by one and returns the incremented 


value. If the i.“arrived" component is undefined -the: -value is taken as zero since it 


1. With the exception of the merge actor, the enabling condition is easily implemented by test 
of equality. Under the restricted use of merge actors in well formed data flow schemas, a merge 
actor is enabled when it receives one input token. 

2. We have treated each operand record as a structure with selector names. This should be 
considered an abstraction that can be implemented in-an optimized form. A practical 
implementation of the operand record would be based on some mapping of the fields into 
operand records of a fixed size. 


indicates that the field is non-existent. 

(3) removef A, i ) 

This operation deletes the i component of A; and is perfermed by the Execution 
Controller upon the delivery of the operation packet for the actor instance (A, i). . 
(4) free A) | | 
The operation deletes the entire activation record A. The section on the 


implementation of procedure activations gives an example of its use. 
1 


The Execution Controtter consists of independent modules that provide caching of 
activation fecordé: For each arriving result packet containing (A, i, k, v, count), the Execution 
Controller performs the operation insert(A, ik, v) and tests the value of the “arrived” 
component against the count component of the result packet. If the values are equal, the 
instruction is fetched. Upon the arrival ofthe instruction packet at the Execution Controller, 
"an operation packet containing the information (A, instruction, operands) is sent to the 
Arbitration Network containing the instruction and.operands from the activation record. The i 
component of activation record A is then deleted by the Execution Controller. | 
The fetch command issued to Packet Memory is of the form: 

< fetch, P, Inst, A >. 

This packet causes the instruction structure of the Inst component of the procedure structure P 


to be brought into the Inst component of the activation record A. 


4.2.2 Two other alternative representations 
In this section we present two alternative representations of procedure structures and 
activation records that have some advantages over the one presented. 


The procedure structure of the first scheme is the same as that of Figure 4.2, but the 
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activation records now have a “text” component for each operand record as shown in Figure 
4.4. This component is supplied by resukt packets destined for the operand record. For each 
enabled instruction, the Execution Controller can, therefore, directly use the uid contained in 
“text” component of the operand record to fetch the instruction without having to obtain the 
uid of the procedure structure from the activation record presented in the previous scheme. In 
this scheme a result packet must, therefore, carry the information (A, i, k; v, count, P), where P 
is the uid of the procedure structure. 

The second scheme is a further optimization of the first. This scheme eliminates the 
redundant information, the “count” and “text” component, carried by-all resuk packets for each 
operand record. The procedure structure is shown in Figure 4.5, where:the “tag” component of 
the destination field is a boolean value of either true.or faise and signifies that the values for 
the “count” and the “text” component of the destination: operand record are to be sent if # is 
true, otherwise, only the operand value is contained.in the resuk packet. The boolean value for 
the “tag” component of each destination structure must be assigned: by the compiler such that a 
true tag is associated with one and only input arc of an actor. A schematic illustration of an 
example of this assignment is given in Figure 4.6, where the broken arc represents the 
destination field to which we have assigned the value true. In this figure, we have chosen the 
assignment rule that assigns true to the rightmost input arc of an actor. Note that a merge 
actor has two broken input arcs, this is because only one branch of a conditional schema is 
executed. 

The content of a result packet is the tuple (A, i, k, v, count, P) if the tag for the 
destination (A, i) is true; otherwise, it is (A, i, k, v).. The structure of an operand record is 
shown in Figure 4.7. Initially, the two components “arrived” and “count” are nil) For each 
result packet the “arrived” component is incremented by one and the resutting value is tested 
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Figure 4.4. Activation Records 
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Figure 4.5. Procedure Structures 
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Figure 46. An example of tag assignments to the schema shown in Figure 3.3 
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Figure 4.7. Activation Records 


against the “count” component! In addition, for a result packet of the form (A, i, k, v, count, P), 
the “count” component is written with the value count and the “text” component is written with 
the uid P. An instruction is enabled when the values of “count” and “arrived” are equal. 

Notice that in all of the schemes presented, the instruction for an enabled actor is 
fetched only when it becomes enabled. Thus, there is an added delay between the enabling of 
an actor and the delivery of an operation packet. A further elaboration of the instruction 
execution scheme can be based on use of the “tag” field and can allow the instruction of an 
actor be to fetched before the actor becomes enabled.2 This is achieved by requiring each 
subsystem that processes an operation packet to issue to the Packet Memory an instruction fetch 
for the destination operand record as it awaits for the arrival of other operands? 

These two schemes introduce additional changes to the iraplementation of procedure 
activations, since the input links and output links serve as the interface between procedure 
activations and must conform to the schemes described. We will not detail such changes, and 


will present the rest of the thesis based on the scheme described in Section 4.21. 


4.3 Procedure activations 
The problem of implementing procedure activations has been investigated by 
{Misun78, Miran77], we present here a scheme that is consistent with our representation of 


procedures. To implement application of schemas, we introduce four additional actors: linkage, 


make-ret, distribute and extract-uid. The symbols for these actors are shown in Figure 4.8. For 


1. if the “arrived” component is nil, it is assumed to be zero. 

2. This is similar to the instruction fetching schemes of lookahead processors. We mention 
that in this scheme the assignment of tags may be important. 

3. In this case, the enabling condition can be modified such that it treats the instruction as an 
additional operand required for the enabling of the instruction. 


(1) linkage 
activation record when constants are written into the actor 
A i v . — oo A. v 


signal — . signal 


(optional) 
(2) make-return when constants are written into the actor 
activation record . base number of resus ~ activation cecord 


A I K 


(3) distribute (4) extract-uid 
(A,1,K ) 


(A, D (A, I+K-l) 


Figure 48. Actors for implementing procedure application 


brevity, we illustrate the implementation with an example. The schema shown in Figure 49 is 
a translation of the schema for the factorial function shown in Figure 33, and embodies the 
additional actors, This embodiment is based on an instruction assignment rule that assigns 
integers to each actor of the augmented schema. The modification creates an (m2, n) schema 
from an (m,n) schema translated frem a textual ‘program described in Section 3.2. The 
instruction assignment rule is the following (referring to Figure 4.9): 


(1) The link actor labeled ret is assigned the integer one. 

(2) The link actor labeled env is assigned the integer two. 

(3) The remaining m input link actors are respectively assigned 3, ... and m+2. 

(4) The | linkage actors that supply ingut values to the new activation and actors that 
receive output values from ‘it are respectively assigned consecutive integers. In 
Figure 4.9, the actors labeled I, I+l,-... 163 are linkage actors supplying input values, 
and the link actors labeled J J+l receive result values from a procedure activation. 

(5) The assignment rule for the remaining actors is arbitrary. 


In Figure 4.9, the first input tink actor labeled “ret” expects a value that encodes the 
destinations to which output values will be returned. The encoding consists of the uid of the 
activation record, the smallest integer assigned to the link actors receiving output values, and 
the number of output values. The distribute actor: decomposes this tuple into destinations and 
_ forward them to output linkage actors of the new activation. A linkage actor communicates 
between two different activations and expects three inputs: a value v, an instruction number i, 
and a uid of another activation A. The firing of a linkage actor (Ay, 4) in an activation Ay 
sends to the operand record (A, i) the resyic packet (A, i; 1, v). In addition, this Hokage e actor 
may have a signal output arc destined for an actor within the activation Aj. 

The second link actor expects the uid of the environment structure that contains all 


procedure structures with their names as selectors. 


The semantics of the apply actor is implemented by using. create-activation to allocate 


an activation record. The create-activation actor: requires two inputs: a uid of.a procedure 


F : (4, 2) schema 
ret 
1 


V, nv Ae 


Figure 4.9. An example for the implementation of the apply actor 
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Structure and a signal that is generated only when all.input arguments for the activation have 
been computed; and its output. is a free uid A. The wid of a procedure is selected from the 
environment structure using the name of the procedure. The uid-of the activation record A is 
sent to the linkage actors I, ... and 1+3 which forward arguments to the activation. For these 
linkage actors the instruction number of the destinations are respectively assumed to be 1, ..., and 
4. The value encoding the return destinations for the new activatien. is constructed by the 
const-ret actor using the output of the extract-uid actor which extracts from a result packet the 
uid of the activation; and it is sent to the first input link of the invoked activation through the 
linkage actor, I. 

A free actor releases the activation record and is enabled only when all activities within 
the activation thave ceased! In Figure 4.9, notice that the signal output arcs. of the output 
- linkage actors on the bottom of the figure are connected to the free actor through a sink. Thus, 
the free actor cannot be enabled until all output linkage actors -have delivered their output 
values. The uid of the activation record is returned. to the poo! of free uid’s managed by the 
Packet Memory. . | 

The translation of textual programs into augmented schemas is straightforward and 
can be based on the translation rules presented in Section 3.2, and we will omit further details 


of the process. 


4.4 Tail procedure application 
In sequential programming languages, a tail procedure application is a procedure 
application that. occurs as the last statement in another procedure. For our value-oriented 


language, a tail procedure application is identifiable as a procedure application in the 


1. This is guaranteed by the compiler that translates textual programs into data flow schemas. 


expression of the body of a procedure whose:output value is returned as the value of the entire 
procedure. For languages that have iterative constructs; the translation of an iteration loop into 
its equivalent recursive form of computation: resus in a tatt recursive procedure. Often, some 
recursive programs. can be transformed into tail recursions as weit! In programs with tail 
procedure applications, the resuk of a tail: procedure application of P2 within Pt is simply the 
result of the procedure application P2. (If Pi and P2 are the same, then they form a tail 
recursive procedure.) Such tail procedure applications: occur: frequently enough that the 
activation record of PI should be deallocated as soon’as'possible. Without such optimization, 
the outermost procedure activation remains until all nested procedure activations are freed. . 
Since the subject of compiler optimization is not within the scepe of this thesis, we will 

simply present an example to iHustrate how such optimization might be accomplished with the 
‘procedure application scheme introduced. In Figure 4.10, we give an alternative recursive 
program for the computation. of the: factorial function. “In this schematic iflustration, the link 
_ actor labeled ret provides the necessary information for the compiler. to ‘add actors to form the 
necessary linkage between the deepest nested procedure activation and the outermest procedure 
which invoked: the factorial computation. 

-- In Figure 441, we give examples-of situations where tail-provedure application can be 
optimized. While it is possible to optimize on reasonable cases of such tail procedure 
applications, it is not clear that the complexity introduced is desirable. 


1. These translations are not assumed to be an important part of the task of the compiler, but 
such optimization may be embedded if feasible. For a-tser*batgiiage which have: iteration 
constructs, the translation would naturally lead to tail recursions, and-thus the opportunity for 
this optimization should be taken advantage of. 


G : (5, 2) schema 


ret env trigger x y 
2 


create- — 
' . activation 


5 . v, 4 '* 
+ ZY - eg ve 


G = procedure(x, y : integer) yields integer 
_. vifx <4 then y 
else G( x-l, (x-I)ey ) 
end 


end G 


Figure 4.10. An example of a tail procedure application 
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fy is a tail procedure application — 
within f; and fo contains.a tail 
application of f. a 


Figure 4.11. Examples of tail recursions 


4.5 Discussion 

In this chapter, we have presented a processor that is capable of supporting the 
semantics of the data flow schemas and the concurrency of operation. We have presented an 
abstract view of the operation of the processor and have discussed several alternatives of the 
instruction execution schemes. The choice of the execution scheme would depend on many 
factors that need further investigation. Some of these factors are: the delay characteristics of 
packet traversal through the networks, and the trade-off between the amount of storage needed 
to store operand records and the delay of instruction execution. 

The instruction execution schemes we have presented are all called piecewise copying 
schemes, because each instruction is not fetched until the instruction is known to become an 
enabled instruction. Another alternative is to fetch all instructions of a procedure structure into 
an activation record at the time of creating the activation. This scheme would require that the 
instructions for actors on one branch of a conditional schema be deleted when the test outcome 
of the predicate for the conditibiial schema becomes known. This scheme also suffers from the 
larger storage required to store the instructions at any instance of time during the activation. 
Its advantage is that instructions can be fetched possibly with a single request to the Packet 
Memory rather than with as many requests as the piecewise copying schemes; thus, it reduces 
significantly the amount of packet traffic to the Packet Memory. At this level of discussion, it is 
not clear that this scheme offers greater advantages. To analyse this further would require 
further elaboration of the architecture and some understanding of the behavior of piecewise 
copying schemes. 

The implementation of data structures and activation records by the Packet Memory 
has not been discussed in this chapter. We elaborate on this subject further in Chapter 6. 

We have not detailed the translation from the language to the augmented schemas, but 


the details are straightforward and present no additional difficulties once the translation rule 


presented in the Section 3.2 is understood. 
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Chapter 5. Stream, Nondeterminacy, and Forall 


In this chapter we introduce several extensions to the language described in Chapter 3. 
These extensions are useful for expressing many forms of computation which are not 
conveniently expressible in conventional programming languages. Streams are an important 
abstraction for expressing computations on sequences of values. The implementation of this 
abstraction does not constrain the inherent concurrency of these computations and is guaranteed 
to be determinate when primitives for nondeterminacy are not used in the program. Another 
form of concurrency arises when a procedure is applied on all components of a data structure to 
produce new data structures or scalar values. The forall construct introduced in section 5.3 is a 
useful feature for expressing this form of concurrency. 

Nondeterminate computations, computations that may depend on the timing of 
execution, can be expressed by merging two streams in a nondeterminate manner. It is 
important to realize that there may be computations which are not easily expressible with this 
extension of the language. This limitation is due to our lack of understanding of semantics for 
nondeterminate computations and of how such computations can be expressed in a 


value-oriented language. 


5.1 Streams 

The concept of a stream is an alternative approach for expressing computations that 
have conventionally been expressed as coroutines or a set of cooperating processes. For 
example, the organization of a compiler is often viewed as a set of coroutines each 
corresponding to a phase of the compiler, and we often view processes that perform input and 
output operations as a set of concurrent processes that coordinate using process synchronization 


primitives. 


The significance of programming using streams has been recognized in many works on 
formal semantics [Landi65) and on programming languages (Mcltr68, Denns69, Burge75, 
FreWi78}. 

There are many reasons for expressing computations in these forms. Large 
computations tend to create many large intermediate data structures that take up storage space. 
Coroutine mechanisms are often used to alleviate this problem by partitioning intermediate data 
Structures into smaller units such that the total amount of Horage used for intermediate data 
Structures is reduced. The second reason is to allow these subcomputations to be concurrently 
executable by using explicit aynchvontaten primitives. ‘The third and subtler reason is that 
program structures expressed in these forms are more moduler in the following sense: program 
modules can be expressed as a function over streams and their overall behavior can be 
characterized as compositions of these functions using denotational semantics (Kahn74] 

Writing programs for applications that lead naturally to these forms of computations, 
however, has been difficult in sequential programming languages that have explicit coroutine 
mechanisms and echinaniiition primitives. Because these primitives require explicit 
initialization of either control sequences or common synchronization variables, the correctness of 
these programs is more often than not difficult to establish and programming errors may result 
in deadlocks or unwanted nondeterminacy. | | 

Since many of these computations are inherently determinate, it is desirable to be able 
to express them in a more structured manner and without these undesirable properties. Using 
streams as presented here, one can express computations of these forms such that the inherent 


concurrency is not lost and the result of the computation is determinate and free of deadlocks. - 
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5.1.1 Stream operations 
A stream is a sequence of values, all of the same type, that are passed in succession, 

one-at-a-time between program modules. The operations on values of type stream of T are 
defined below where s and s’ are streams, and c is a value of type T. 

0 

The result is the empty stream which is the sequence of length zero. 

(2)-cons (¢, 5 ) 

The result is a stream s’ whose first element is c and whose remaining elements are 

the stream s. 

(3) first (s) 

The result is the value c which is, the firet element: of s...If s = (1 the result ts 

undefined. 

(4) rest (s ) The result is the stream: fe after removing the. first element of s. If s = 

{], the result is undefined. 

(5) empty (s ) 

The result is true if s = J, and is false otherwise. 
For a non-empty stream s, the following property is, satisfied: 

s = cons( first( s ), rest{ s ) ). | 

We shall use [I, 2, 3] to. denote a constant of type stream of integer whose stream 

elements are the integers |, 2, and 3. Using the notation we give guamples of operations on 
Stream values below: 

Let x = [I, 2, 3) and y = 5, then 

first( x ) = 1, 
rest( x) = [2, 3], 


cons( y, x ) = [5, |, 2, 3], 


empty( x ) = false, and 
empty( [] ) = true. 


5.2 An example program 

The problem of seitating all prime numbers less than a given integer fn is a good 
computation for iMustrating how our data flow execution. scheme: can-express: highly concurrent 
computation using streams. The sieve of Erastosthenes [Knuth69) expressed. in. our textual 
fanguage fs presented.in Figure 5.1. - 

The procedure “generate” produces the sequence of integers beginning with 2 which is 
processed by "sieve" to remove nonprime elements. Procedure “sieve” operates: by taking the 
first elemerit of its input as. a-prime and using: which: all mukiples are: removed by “delete” 
before applying “sieve” recursively to the remaining elements of its input stream. : 

In Figure 5.2, we show a snapshot of the execution:of the program prime_generator. It 
can be seen that a substantial amount of concurrency exists én: the computation if each 
activation of the procedure "sieve" can be executed as soon as the first element in the input 


Stream is available. Section 5.2 shows how this concutrency can be achieved. 


5.2 Implementation of streams 


In this section. we first present a correct and efficient implementation: of streams, and 
then discuss why another akernative scheme is not adequate. The alternative scheme is 
presented here because it is a natural consequence of thinking in terms of tokens in: the data 
flow model of computation, but it neither correctly nor efficiently implerient the semantics of the 


language. 


prime_generator = procedure (n: integer ) yields $ stream of integer; 


generate = procedure { i, n: integer ) yields sream of integer; 
ifi<nthen {1}. so ate pote 
ele cons (I, generate I,m )) 
end; 
end generate; 


sieve = procedure { s : stream of integer ) yields stream of integer, : 
if empty. (s) then £)_ 
else ‘fet Xx: integer, So, 83 : stream of integer; 
X, Sp = first (s ), rest (5); 
Sy = delete (x, 8 ); - 
in cons ( x, slevel #3 1) 
“end; 
end; 


end sieve; 


delete = procedure ( x : integer, s : stream of integer ) yields stream of integer; 
ifempty(s)then[] 
else Jet y : integer, So, $3 : stream of integer; 
J. Sp = first (s s), rest (3); 


$= = delete ( x, $9), 
in if divide (x, y ) then s, 
else cons ( y, 53 ) 
saa 
end; 
end, 
end delete; 


sieve ( generate ( 2,n )); 


end prime_generator; 


Figure 51 A prime number Generator using streams 


(1!) prime_generator 
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n : integer stream of stream Of 
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A stream of First activation K-th activation 
prime numbers of sieve of sieve 


Figure 5.2. A snapshot for the prime number computation ° 


5.2.1 Implementation of stream operations 

The implementation presented here is based on translation of each stream operation 
into one or more data structure operators that include operations on “holes”. The notion of 
holes used here originated in the work of Hendersoni fHende75) who used the term “tokens”; 
and it differs from the notion of suspensions discussed by Friedman and Wise (Frewi7s}! In 
this implementation, an empty stream is represented by the nil structure, and a stream s is 
represented by a data structure whose “first” component is first( s ) and whose “rest” component 


is the data structure representation of rest( s ). 


The implementation of the stream operations (except cons) is shown in Figure 5.3, and 
is simply a replacement of a stream operation by a simple data structure operation. The cons 
actor is implemented by the actors shown in Figure 5.4, where the actors create-hole and 
write-hole are special data structure operators defined as follows: : 

The output of a create-hole actor is a unfllled hole H which is a uid and a fag in 
{filled, unfilled}. The tag of a hole ceoshasenls its state and affects operations on it: in 
the unfilled state, all data structure operations on the hole are simply pooled - except 
the write-hole operation. Upon the completion of the write-hole(H, v) operation, the 
hole H changes its state to filled and contains the value v; and all previously pooled 


and subsequent operations are processed without further queuing. 


1. The notion of suspension allows one to force the evaluation of some values which is 
promised; and a promised value does not necessarily evaluated as soon as possilbe. 

2. The implementation of the write-hole operation must, in addition to writing the value in 
the address, allow the operations pooled for the hole to proceed. It should be mentioned that 
the operations on holes are used in a restricted context such that only one write-hole operation 
is performed on each hole; thus, thete is no possibility’ ef race between several write-hole 
operations on the same hole. A simple way to implement the pooling of operations is to queue 
them as a list the uid of its head is stored in the hole. 


(a) first 


stream of T 


te | 


(b) rest 


stream of T 


stream of T 


(c) empty 


stream of T 


structure 


structure 


(d) cons 


stream | 
of T 


stream 
of T 


Figure 5.4. Cons 


Referring to Figure 5.4, the effect of the cons actor is to construct a data structure 
whose “first” component is the value v. and whose “rest” component is the hole H from the 
output of.the create-hole actor. The write-hole actor receives as inputs the hole H and a data 
structure representing a stream. Notice that the implementation of the cons actor creates an 
output after receiving the input value v and does not wait:for the completion of the write-hole 
operation! The write-hole operation has a signal output used for ensuring that the activation 
is not deleted. before its operation is completed. 

The first(s) actor is translated into a selects, “first”), and the rest(s) actor is transtated 


into a select(s, rest”) data structure operation? The empty actor is translated into the predicate 
nil-structure(s). | a . . 
Using the earlier example program for the prime number generation, we illustrate the 
concurrency of operations on streams. The schemas for the two procedures “sieve” and “delete” 
are shown in Figure 55 and 5.6. From the schema for “sieve”, it can be seen that the output of 
the cons actor is generated after the first value inthe input stream from the “generator” is made 
"available as the “first” component of the input stream. The second prime number is produced 
by the second activation of the “sieve” and is not available until the first value of the output 
stream of the “delete” becomes available Figure 5.7 shows how varied activations of schemas 
may relate to each other, where we used the notation Dij to dents the th activation of “delete” 


within the i-th activation of “sieve” §;. 


1. By making the “first” component a stream, the language could be extended to include stream 
of < stream type >. 

2. Without going further into the details of the implementation of data structures, we simply 
state the requirement that operations on data structures with holes as components have the 
property that once the holes are filled, they behave as normal data structures. 


sieve = procedure (‘s : stream of integer ) yields stream of integer; 
if empty (5 ) then [] 
else let x : integer, sp, sy : stream of integer 
X, So = first (s ), rest (s ); 
$3 = delete ( x, S> ); 
in cons ( x, sieve( sy ) ) end; end; 


end sieve; 


trigger 


stream 
of T 


a eS 


Figure 5.5. Data Flow Schema for “sieve” 


delete = procedure ( x : integer, s : stream of integer) yields stream of integer; 


if empty (s ) then (J 
else let y : integer, So, Sy : stream of integer; 
Y. Sp = first (s ), rest (s ); 
Sz =delete(x,S9); 
in if divide (x, y ) then sq else cons ( y, $3 ) end; 
end 
end delete; 
trigger x S 


ee ee EE Gem 


5.22 A token passing scheme 

| To illustrate the difficulty of implementing streams using “token passing”, we introduce 
a set of data flow actors for streams [Weng75]. These actors-are ‘Gefitied over streams in the 
sense that an arc typed stream cartics a sequence of tokens of the same type terminated by a 
special end_of_stream (or, est) token - hence, the term “token passing” ‘The notation and the 
operational semantics of data flow actors for stream values are shown in Figure 58, where the 
behavior of each actor is described by a set of firing rules based on the configuration of tokens 
and the state of the actor. Each actor, except est and st-link actors, has two states first and rest, 
~ and is initially in the first state. _ - 

‘An est actor is simply a constant function which generates the spécial est token. A cons 
actor enters the rest state after placing a token from the first input arc on the output arc, and 
returns to the first state upon passing from the second. input arc all tokens ending with an est 
on its output arc. A first actor enters the rest state after placing a token from its input arc on 
its output arc, and returns to the first state upon absorbing 1 alt remaining tokens in the stream. 
A rest actor. enters the rest state after absorbing the first token, and. returns to. the first state 
upon passing all remaining tokens in the input stream: An empty actor tests if an stream is 
empty. In the first state, if the arriving token is an et token, ag baa. ale is 5 irge e and the actor 
returns to the | first state, otherwise, the output is false and it enters the rest state. The actor 
returns to the first state after the remaining toberis are absorbed. An s-switch actor takes a 
boolean input and a stream input, tokens forming a screams -are-passed re the: output arc 
according to the boolean value. An st-merge simply passes the stream to the output from one of 
the input arcs. We restrict the use of st-switch and st-merge actors only to the construction of 
conditional schemas corresponding to the restriction imposed on switch and fgerge .actors 
presented in Chapter 2. An st-link actor replicates a stream by copying each arriving token and 


by distributing them to the output arcs. An st-sink is a sink actor for stream values and 
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(i) est | (ii). cons 


(iii) first (iv) rest: 
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Figure 5:8(a). est, cons, first, and rest 
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occurs when the boolean value is false.) - 
stream dav 
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stream (ix) st-signal .. 
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signal : 


' Figure 5.8(b). empty, st-switch, st-merge, stelink, and st-signel 


produces a signal to the output arc when an est token is absorbed. 


Deadlocks 

The set of actors presented above do not implement stream operations correctly, 
because the substitution of these stream actors for stream Operations results in a schema that 
may deadlock when the predicate of a conditional substherna is an arbitrary expression on 
streams. This deadlock situation is best illustrated by an example. 


Consider the conditional expression, C: 
if first( rest( rest( s )) ) 
then s else rest(s) end; 
where s has the type stream of boolean. 


The translation of the conditional statement C yields a conditional schema S shown in Figure 
5.9. The predicate of the schema S consists of a chain of stream actors. Execution of the 
schema for an input stream s = [true, true, ... , true) would deadlock because the input link 
marked with the symbol « is prevented from firing by the left output arc holding a token. This 
Situation arises when the predicate controlling the st-switch actor Fequires an arbitrary number 
of input tokens to produce the decision outcome. Most predicates, however, can be analyzed at 
the compile time so that additional link actors are added between the input st-link actor and the 
st-switch actors to avoid deadlocks. 

This example illustrates. a es important property: the arcs of the data flow schema 
are finite buffers. In a computation model that allows infinitely buffered arcs, it can be shown 
that the history of tokens passing through each arc agrees with the history obtained by the 
mathematical characterization proposed by Kahn [Kahn74}: For computation models based on 
arcs of bounded size buffers, the history observed is a prefix of that observable if arcs are 
unbounded buffers. No mathematical treatment has been found which shows how to derive the 


exact history for models with finitely buffered arcs. This property of data flow schemas is | 


(i) Initial configuration (ii) After, one firing of ¢ 


.. There is no-actor enabled; 

notice that the st-link labeled » 

is not enabled ‘because the output arc 
rest to the st-switch is occupied by 


‘atoken 


ays 


Figure 59. An example of a deadlock situation 
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undesirable, since the output history would depend on the amount of buffering provided by the 
number of link actors in a data flow path and cannot. be characterized in a clean formal 


semantics. 


Inefficiency 

We use the prime number computation presented in Figure 5.2 to illustrate the 
inefficiency of implementing streams as a sequence of tokens passed along an arc. Referring to 
Figure 5.2, if we regard each stream operation as a token passing actor, the computation is 
inefficient, because stream actors form a chain that alt tokens in a stream: must travel through 
during the computation. For example, the prime number that is generated by S; must travel 
through the chain of } (i-l) cons actors to reach the output of the S; In fact, the number of 
firings of a data flow actor to process a stream of length f is proportional to n, and for a chain 
of n actors it is propertional-to n? in the worst case. | 

The rate at which streams are generated..or consumed, however, is not necessarily 
reduced due to this traversat because all tokens can be traversing a chain of stream actors 
simultaneously if the execution time of all stream operations: does not have a large variation. 
The execution delay caused by the traversal would be much larger if some stream actors in a 
chain are delayed such that sections of the pipeline containing the-stream. actors are void of 


stream elements. 


5.3 Forall 
In many applications, operations on components of a data structure can be performed 
concurrently. We present a construct for expressing concurrent computations on arrays. . First, 


we define a data type array of <simple data type>.. The form of a forall expression is: . 


<forail expression> == forall <range clause> <eval clause> end; 
<range clause> ::= <name> in [<expression)>, <expression>) 
<eval clause> se { eval operation <expression> }* 


| let {<type decl>}; {<name def>} in <eval clause>; 


It is required that <expression,> and <expressiong> are of arity one.and of type integer. 
Furthermore, the values Ib = <expression)> and-ub » cexpressiong> must satisfy lb < ub. The 
expressions in the eval clause can contain: references to N, the <name> of the range clause, and 
must be of arity one. The resuk of the forall expression is an. expression of arity k, where k = 
the number of eval’s in the eval clause. Its j-th value-is-equivatent to the result of the following 
expression: 
| E(N- tb ) 0; E(N = Ibel-) Oj --- Oj E( Neub ), 

where Oj and Ej denote the operation and the-expression in: the: ¢th eval dause, and a 
notation EA N = i ) denotes the jth expression evaluated. using the free variable N with the 
value i. For the above expression to be well defined, -we-further require that the aperations 0; 
are binary (requiring two operands) and associative. - 


Consider the folowing example: 

forall i = (5, 100) 

eval+ ALi) . 

eval » (ALi) + BLE3]-i); 

end; 
The resulting expression is of arity two: the first value is simply the sum of. all values 
A[5],..,,A{100) of the array A, and.the second value is the product of the expressions Ali}+BUi-3}1, 


for i ranging from 5 to 100. 
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The construct can be easily translated into a recursive procedure as follows: 


P = procedure(Ib, ub, <free-list>) yield Rp... Ry; 
if ub < Ib then undefined, ..., undefined 


else 
ifub < Ib 
then E,( Nelb ),..., E.( N=lb ) 
else let middle : integer, 
xy: Rye» Xy: R,, 
y ‘ Rye yy ; Ry; 
middle = ( Ib + ub ) / 2; 
Xpon Xp = P( Ib, middle, <free-tist>), 
Yjr-o Y, = P{ middlesl, ub, <free-fist>), 
in x; O; Yp--- + X_ Oy Hs 
end; 
end; 
end 
end P; 


where the <free-list> is the list of identifiers (other than the identifier N appearing in the range 


clause) that are free in each expressions Ej. 


It should be noted that the recursive procedure as defined is not the only translation 
possible, since each recursion can create any fixed number of activations. The translation is 
only intended to show that the construct can be supported within the framework of our 
architecture without additional special functional units for dynamic creation of concurrent 
computations on arrays. It is interesting to observe that similar types of forall expressions 
cannot be easily defined on data structures that are not arrays. The problem is that we do not 


have any information about the selector names of a data structure. 


5.3.1 Constructing data structures 

It is possible to devise a mechanism for defining a: more general form of forall 
expressions on data structures provided that the implementation of data structures is known. 
As we have mentioned in Section 2.4, a data structure is represented by 2 collection of items 
' each containing a set of tuples of the form (s, c), where-c is either a scalar value or a uid of 
another item. While it is possible to implement items capable of storing a variable number of 
tuples, an efficient implementation can be based on sotage nodes that can contain only a fixed 
number of tuples - we shall call these nodes primitive items. In the latter scheme, an item may 
be represented by more than one primitive item. Ani example of the representation of an array 
with primitive items is shown in Figure 5:10. Each primitive item (pitem) consists of two tuples: 

{ (70: Cy) I": Co) }, 
where C, and Co are either scalar values, uid’s of other pitem, or nil’s. The example is an 
_ array A such that: | 


ALA) = 2, 
A[2] = 3, and 
Ali) = nil, for all other i from 0 to 6. 


In this representation, the traversal from the root node A to a leaf node defines an ordering 
from less to more significant bits of the binary representation of the index to the array A. 
Using this representation, we show two ways of constructing an array using the forall construct. 
We define an associative operation construct for cinstrocting an array from two arrays. 
This operation is defined only when indices of non-pil elements of the two arrays are disjoint. 
‘This is satisfied when construct is used within the forall construct in a fashion such that the 
condition for disjoint indices can be determined at compile time. The construct operation is 


defined recursively as: 
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This node contains the pitem 
{ "0" : uidy), CI": nil } 


AL "010" ] = A[ 2] 


A[ "100" J = AL 4 J 


Figure 5.10. An array representation 
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construct = procedure( A, B ) yields structure, 
if niX A) then B 


else if niK B) then A 
else if scatar( A) and scalar( B ) then error 
else if scalar( A ) theaemigrate( A, B) 
else if scalar( B ) then migrate( B, A) 
eke let Cp » construct( Ae0", Be"0" ); 
Cy = gonstruct( Ae'T’, Be'l” ) 
ee eee 


end; 
end; end; end; end; end; 
end construct; . 


? 


The operation e is defined such that the result of: Ae’0" teturns s, where A is a pitem 
containing {(°0" : s), (I": t)}. The resutt of make-pitem(C;, Co) ts a pitem {('0" : C)), (1 : Co}. 
The function of the make-hole( x ) operation is to create a hole H-which is returned as the 
output of the construct and which is later filled, with: the item x. The procedure “migrate"(A, B)_ 
takes a scalar value A and stores it into the feftmost available component of B whose selector is 
formed by a sequence of bits "0". Figure 5H illustrates the sii in which the result of 
construct(Ay, Ao) is created. 
An example of the use of construct in a forallcoinstrus i: 


forall i in (5, 100) 
eval » ALi}+BLil 
eval construct append( nil, i+l, | 
if i =5then ALI) + AL isl] 
else if T= 100 then AL i-1) + ALE) 
ese Ali-t}+ ALi}+ AL ot) 
end; end;) 


end; 
Notice in this exampte that the resulting array contains indices in the. range (6, Wil In general, 
the expression forthe selector ide the append met be eric to simple expesions t 


il 2 3 il 


— — 


H <——— resulting structure of construct(A;, Ao) 


{ 
\ 
~ {0° : aaa ee nid} 


@ the result of construct{ nil, nil ) 
{ 
\ 


~ {Co": bid(Hog) 1”: uid( (age o"}eI” ))} 


Hoo the result of 
construct( (A,e"0")e"l", (Age"0")e"l” ) 


ees t: 


("0"), CI": 2} 


the result of _Z LL the result of 


construct( . construct( 
((Aye"0")e"0")e"0", ~~ 4(Ae"O")0"0"0" ", 
((Ag@"0")e"0")e"0" ) ((Age*O")e"0")e"l” ) 


Figure 511 An example for the working of construct 
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guarantee the disjointness of the indeces; and we leave this as an issue for language design. 


In the forall construct as presented, the range clause may only be integers. This is 
undesirable in cases where the range is-much larger than the number of data elements in the 
array, because the number of activations created would be much larger than the number of 
elements in the array. We introduce another form of specifications of the range clause: | 

<name> in range of A, . . | | 
where <name> is an identifier that aa range through all one level indices of the array A; thus, 
the range clause is not usable for specifying compound selectors as the indices [i, j] of a two 
dimensional array. 

An example of its use in a forall is: 


forall i in range of A 
eval « ALi} + Al isl ], 
eval construct append( nil, i, ALi] + BLi]) 
end; 
The above forall expression can be translated into the following call to the recursive procedure 


P: 


P(A, A, B, nil); 
where 
P = procedure( a, A, B, i) yields integer, array; 
if niK a) then 1, nil 
else if scalar(a)then ALi J+ ALi) ALi] + Bli) 
else let lefty, lefty = P('ae0", A, B, “O"ei ); 
righty, right, = P( ae'l’, A, B, T's ) 
in lefty e righty, make-pitern( lefty, right, ) 


1. If the expression is an arbitrary function on i, then there is no simple compile time check for 
this condition. One must define the semantics of data structures very carefully, if any 
expression is allowed. 


Nee 
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end; 
end P; 
The result of the expression "O"si is a concatenation of two bit strings such that, if i="00I", then 


the result is "0001"! The procedure P works by “tracing” down the array A for each primitive 
item a and by creating recursive procedures for the components ae"0" and ae’!" of the primitive 
item. The construction of the resulting array C by using the make-pitem is possible because the 
selector in the append expression is of the simple form i; if not, the expression 

make-pitem( left,, right ) 
must be replaced by 

construct left), right; ), 
and the expression 

ALiJ+ Blil 
must be replaced by 

append( nil, exp, ALi] + BLi)). 
The reader can verify that the number of procedure activations created is the number of the 
leaf nodes of an array representation. A further step for optimization is possible for the above 
example: notice that the value of the expression A[ i ] equal a when the predicate scalar( a ) is 
true. Thus, there is a significant amount of compile time analysis involved for translating the 
forall construct into the procedure P. We note that the above translation together with the 
optimization can result in significantly efficient programs. 

The two forall translation schemes presented provide more expressiveness for the 
language but are dependent on the representation of data structures. Further extensions for 


allowing the range clause to include data structures in general can be envisioned. In particular 


1. We will assume that the representation of such bit strings is not difficult. 


-4- 
the latter form of range clause can be readily extended to data structures. 


5.4 Nondeterminate me merge of e of streams 

In this section we introduce a primitive that can be used to produce a stream by 
nondeterminately merging two streams. We believe this primitive may be used successfully in | 
building well structured programs. Often, nondeterminacy in a computation can be expressed 
using arbitration sire streams of values, and procedures that operates on the resulting 
streams. (rt is not clear that there are not form of nondeterminate computation that have only 
awkward realization in terms of streams, and this is an area for further research.) The 
particular implementation of the nondeterminate merge of streams is in terms of a recursive 
procedure and is reasonably efficient. | | 

A primitive nondeterminate merge actor, n-merge actor shown in Figure 5.12 has two | 
inputs I, and Io, three outputs Oj}, Oo and Ox, and has two states first and second, In the first 
state, an n-merge actor can fire as soon as an input token arrives at either one of the input arcs 
I, or Io. Upon firing, it places the input token on Oj; and, on the second output arc Oo, it 
. places an integer i if I; is the input arc having received the token. After the firing, it enters the 
state second to expect another token. In this state, the second token is simply absorbed and a 
signal is placed on Og; and the actor returns to the first state. If two tokens arrive 
simultaneously, then: one token is selected and placed on Oy an integer indicating this selection 
is placed on Oo; a signal is placed on Og; and the discriminated token is simply absorbed. We 


show a correct implementation of the n-merge in Appendix A! 


1. Since the firing rule depends on the timing of the arrival of input tokens, an Execution 

Controller must implement this critical region correctly. Furthermore, the n-merge actor 
requires two firings, and the popleme tan! must Lg consent. = the instruction axenion 
scheme described in section 4.2. 
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(a) Firing rules for n-merge 


Figure 5.12(a) Firing rules for n-merge 


(b) Firing rules for n-merge 


Figure 512%b) Firing rules for n-merge 
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The recursive procedure "N-Merge” in Figure 5.13 defines a nondeterminate merging of 
two input streams using the n-merge actor. Each activation of the recursive procedure obtains 
the first elements of streams S; and So and merges the two values nondeterminately with an 
n-merge. The first arriving value is cons’ed to the recursive call on the other stream and the 
rest of the arriving stream. This recursive definition performs the merging of two streams at 
the expense of some redundancy in the number of first operations on the streams to be merged, 
since the slower of the arriving values first(S)) and first(S) at the n-merge actor is discarded 
and the subsequent recursive activation also performs a first operation on the slower stream 
value. Thus, the number of first operations on two input streams of length n and m is bounded 
above by 2(n + m). Another problem of the recursive N-Merge is that the number of 
activations is about the same as the number of operations waiting for stream values which have 
not been generated! It is possible to remove these inefficiencies by introducing a set of data 
flow actors connected in a cyclic fashion (see Appendix B). Unless the inefficiency of the 


recursive definition is severe, the cyclic definition is unnecessary. 


5.5 Discussion 

| There are a number of extensions that are convenient for writing procedures on 
Streams. In many situations we find. it necessary to generate a stream of values with a base 
value followed by values of some constant increment. This stream value can be simply 
expressed as: 


{base by increment until final_value]. 


1. Notice, however, that the cost of keeping these activations active is relatively little, since only 
a very small number of operand records would reside in the system. But this situation can be 
intolerable when one of the streams is never generated or gets arbitrary behind the other. 
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N-Merge = procedure (Sy, So : stream of T ) yields stream of T; 


let x, i= n-merge( first( S, ), first( So )); 
Y, Yo = rest( 5] ), rest( So) 

in if i=2 

then if undefined( x ) then S, else cons( x, N-Merge( Sy, Yo )) end; 

else if undefined( x ) then So else cons( x, N-Merge( Yj, So )) end; 


end; 


end N-Merge, 


Figure 5.13 A recursive nondeterminate merging of two streams 
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Conversion between an array and a stream is also often necessary. 


A more important language problem, however, is whether data types stream of <stream 


type> are needed. The implementation described in Section 52 naturally extends to stream of 
<stream type>. It is. not clear, however, that such extensions are of significance to expressing 
concurrent operations on streams. From the point of view of defining formal semantics for the 


language, it is much cleaner to have data types stream of stream, or array of stream. 


We give an example for illustrating the expressiveness of stream of stream. In 


performing computations on arrays it is often useful to have the type stream of stream. The 


program in Figure 5.14 is often referred to as a “hyperplane” computation on arrays. Figure 5.15 
is a diagrammatic explanation of the manner in which the computation “Hyper” is performed. 
The top horizontal array C corresponds to the ‘stream C, and the left vertical array B’ 
corresponds to the stream B. In the lower right quadrant bounded by the two arrays C’ and B’, 
the two dimensional array D’ corresponds to.the output of the procedure “Hyper”. Each point 
on a row of D’ is computed using the procedure “Compute” by taking the west, the north-west, - 
and the north neighbors of the point. The value of the point is computed by applying the 
function “Neighbor” on the values of its neighbor. The dotted lines show how points of the 
array D’( or the stream of stream D ) are produced as the computation proceeds. 

In this example, the amount of concurrency is at most the number of elements in the 
stream B, but this concurrency is not achievable if the computation is expressed with arrays. 

Extensions of the language to include other forms of nondeterminate primitives are of 
critical significance. Can streams be used to implement language primitives similar to the 


monitor [Hoar72}? We leave this as a further research issue. 
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Hyper = procedure( B, C : stream of integer) yields stream of stream of integer; 


if empty( B ) then O 
else let  b: integer, D: stream; 


b= first BS. 
D =Compute(C,b); . 
in cons{ D, Hyper( rest( B ), cons( b, D ) )) 
end; 
end Hyper; 


Compute = procedure( C : stream of integer, b: integer) yields stream of integer; 
else if empty( rest C)) then () 
else let di: integer; | 
d = Neighbor b, firsi( C ), rest( C) 
in cons ¢, Compune(:rent{ C0) 


Figure 514 An example using stream of stream 
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D'li, j] = Neighbor( W, NW, N ) 
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Figure 5.15. An illustration of a hyperplane computation 
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Chapter 6. Supporting Data Structures and Activation Records 


In this chapter we state several requirements for designing the Packet Memory to 
support the structures used to implement the language. The Packet Memory stores three types 
of objects: data structures (including procedure structures), activation records, and holes. We 
propose the implementation of all objects is based on allocation of items which are of fixed size. 
Based on this design decision, we show how operations on these objects can be implemented 
efficiently. Since the design of the Packet Memory hasbeen pursued previously: by [Denns75, 
Acker77], we will not treat the Packet Memory in great detail. ‘What concerns us is the manner 
in which the Packet Memory must be used to correctly implement the objects. Functionally, the 
Packet Memory maintains a poot-of uid’s-for free items, Each item contains a fixed number of 
tuples (s, c), where s is a selector name of some predefined size and c is either a scalar or the uid 
of an item. For brevity, we will often.use the word “item” to mean the contest of the item 
and/or its uid. | | | 

We discuss how these objects can be efficiently implemented in a Pakcet Memory 
organization that has multiport and multicache memory. Of particular interest in this 
organization is the cache siyanigaiion which achieves concurrency of simultaneous access to an 


item; and this organization may be applicable to other concurrent systems. 


6.1 Packet Memory 


The organization among the Packet Memory, Structure Controller and Execution 
Controller is shown in Figure 6.1. The Structure Controller receives data structure operation 
packets from the Arbitration network and sends result packets to the Distribution Network. 
The hole-operation output port of the Seleure Controller . connected to an input port of the 


Arbitration Network. (This connection is not shown in Figure 4 of Chapter 4.) The function 
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operation packet for holes 
to Arbitration Network 
operation packet . result packet 
to Arbitration Network ' ~~ to Distribution Network 
result packet |. data structure operation packet _ 
from Distribution Network from Arbitration Network 


\ . “ . ° . 
aul | i 


RSP CMND ~~ RSP 


Packet Memory Network(PMN) 


_ Packet Memory 


Figure 6.1. Organization among SC, EC, and PM 
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of the port is explained in Section 62.. Each Structure Contreiler Module (SCM) and 
Execution Controller Modute (ECM) is connected to the Packet Memory via a Command 
(CMND) port and a Response (RSP) port. A Command port: receives commands on an item 
specified by its uid, and the response is eventually returned to the Response port associated with 
the Command port. The types of commands..include reading an item, writing an item, 
requesting a free uid, and changing the reference count of.an item. These commands are issued 
by both ECM’s and. SCM's, and processing of a result packet or a data structure operation 
packet may require more than one commands. 

The Packet Memory consists of a Packet Memory Network (PMN) and a set of 
Memory Modules (MM). The PMN is a packet routing network whose nodes may be cache 
modules (CM) that have cache memory for frequently accessed items and necessary control 
functions for management of the cache. One approach for generating unique identifiers is to let 
a uid be an address from the physical address space formed. by storage nodes of the lowest level 
of the memory hierarchy of the Packet Memory! For example, using current technology, the 
physical address space would consist of all addresses of secondary on-line storage devices such | 
as disks. Each storage module in higher levels of the hierarchy acts as a cache, and in general 
each entry in such a storage module must contain both the data of the item and its full physical 
address (i.e. its uid). Many techniques can be applied to the design of caches for finding an 


item: for instance, searching(possibly including tree search techniques), hashing, or hardware 


1. Another method for generating unique identifiers is to. use counters that are never reset, or 
are reset very infrequently. Our approach is shared by Snyder’s work [Synde79] on 
architectures for object-oriented languages like CLU [Lisko78] The main reasons for not 
choosing the counter scheme are that it requires the lowest level memory to store both the uid of 
an item and the data and that accessing an item cai be prohibitively expensive if search needs 
be conducted at the lowest level of the hierarchy. We.should remark that the efficiency 
arguments presented here may not be justified considering the projected’ iia lee 
developments and increasing sophistication of storage devices. 


associative matching. The criteria for placement and replacement-of-an item in a cache is not 
of central issue to us here, but a possible candidate is Least Recently Used (LRU) replacement 
algorithm that has proven attractive for demand paging! memory-management. For further 
study, we refer. readers to: {Acker77] for details: of a possible implementation of the Packet 
Memory including the design of CM's; (Smith76] for-set associative memory organization, and 
{Denng70] for a general discussion on paging systems. 

Assuming that each Memory Module stores a distinct subset of the total uid’s, a basic 
design consideration is the manner in which an item can be: moved or copied in PMN. 
Informally, we say a caching scheme is a “unique access” scheme:if; for each. item, the set of 
reachable caches from CMND ports to a MM forms a linear path; otherwise, it is called 
“multi-access” if the set forms paths containing branches. Figure 6.2(a) illustrates a unique 
access structure where the network routes-cammand packets for the same item from any 
command. port to the same cache module, and Figure-6:2(b) and-(c) illustrate. two. multi-access 
structures. It-is often possible that-a-mutti-access caching ‘structure behaves like a unique access 
structure when used: in a restricted manner... For example; when. commands on an item are 
always presented at the same input port of the cache structure shown in Figure-6.2(b), the only 
caches reachable from the port to the MM associated. with theitem forms.a linear path. The 
structure in Figure 6.2(c) does not have this property because the set-of caches on the paths 
from the input port 3, to the memory module MMg does not form a:linear path. - 

For PMN, we expect its caching structure to belong to the class exemplified by the 
structure in Figure 6.2(b).. We classify items into two classes: restricted and unrestricted 
according to how they are used. We do not statically partition ail iterns into two classes because 
it is desirable to be able to use a free item in either manner and because the distribution of 
their usage is not a parameter that we can determine safely. Using this classification, we 
describe the manner in which an item is handied by the cache structure of the Packet Memory. 
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Figure 6.2(a). A unique-access Packet Memory Network 


Figure 6.2(b). A multi-access Packet Memory Network with unique-access property 
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Figure 6.2(c). A multi-access Packet Memory Network 
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Since a restricted item is accessed only through a particular CMND port! and all 
commands result in memory references along the unique path, there is no need to have several 
copies of the item. A restricted oi is, therefore, moved along the caches on the path rather 
than copied. The first use of a free restricted item is a writing command to a CMND port 
which creates. an instance of the item. Subsequent tommands on the item must be from the 
same CMND port and may cause. the item to be moved ‘into a cache at a higher level of the 
PMN hierarchy. Such items have a nice. property that they can be updated without the 
consistency problem of multiple copies in several caches (or, the multi-cache coherence problem). 
A consequence of this shasta is that a restricted item can be garage Collected as soon as its 
reference count becomes zero. As we shall see in Section 6.2, we use ‘this property of restricted 
items to implement activation records and holes. 

For unrestricted items, we allow copies to exist _in. several CM's to provide the 
opportunity for alleviating contention over a single copy ofthe item by storing several instances 
of the item in different caches. We shal call such copies instances of an item. Initially, an item 
must be written by a command from some CMND port. | This command must write through all 
caches leading to a unique memory module MM from which all higher level caches can access 
the item. The command does not acknowledge completion of the operation until this 
write-through operation is completed. Subsequent commands on the item may cause instances 
of the item to stored in caches of higher level and operations are performed on them. It is 
evident that it is possible to have inconsistent instances if the content of an unrestricted item 

| can be updated. Therefore, we require that all subsequent operations on unrestricted items are 


Il. The particular port for accessing an item is fixed over the lifetime of an item - i.e. from its 
removal from the free uid port until it is garbage collected again - but need not be the same in 
different lifetimes for the cache structure shown in Figure 6.2(b). 
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commands on reference counts or for reading the item. This requirement is naturally satisfied 
by the semantics of the language whose data structure operations are free of side-effects. We 
now present a scheme by which an item canbe. garbage collected correctly. This garbage 
collection scheme is correct only when the set of'caches reachable for. an item forms a tree-like 
structure with the MM as its root such as the structure shown in Figure 6.2(b). Furthermore, no 
garbage collection is performed on copies in the PMN. 

Each instance of an unrestricted item contains a copy count indicating how many copies 
have been made directly from it. Each time an item is copied from one cache to another, the 
reference count and the copy count of the new instance is set to zero, and the copy count of the 
source instance are incremented by one. Upon completion of copying, commands can be 
exercised on the new instance. If an instance is displaced from a cache, its reference count is 
added to the reference count of the source instance whose copy count is then decremented by 
one. We require that an instance is displaced from a cache only if its copy count is zero, this 
ensures that all existing instances form a properly connected tree and that only instances at the 
leaf nodes are displaced. For all instances created by the initial write-through, except the one in 
MM, reference counts will be zero, copy counts wil] be one. The instance in MM contains a 
reference count of one, and a copy count of one; and possibly a tag identifying it as the root 
node instance. 

This scheme allows an inaccessible item to be garbage collected eventually as the result 
of merging instances of inaccessible items displaced from caches. That the reference count of 
the final unique instance. is correct can be seen by noticing: the correct. reference count is the 
sum of all reference counts, some negative, of all instances; and the strict displacement algorithm 
and the tree-like access paths ensure that the copy count of the unique instance is zero if and 
only if all reference counts have been. accumulated. The garbage collection on an item takes 


place if the reference count and the copy count of the root node instance are found to be zero. 
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The scheme can be very slow in reclaiming inaccessible items if some instance is not displaced 
‘from a cache. This situation could be a problem if free items in the Packet Memory are in . 
Short supply and the the system is in a state such: that:dnstantes are: not displaced from caches 
due to lack of movements: of items in the Packet Memory. This situation, however, would not 


arise frequently in a well designed Packet. Memory. 


6.2 Activation records and holes 
We implement activation records and holes with restricted items because efficient 
implementation of these objects requires updating the contents of items. Operations on 
restricted items are handled differently in implementing these ‘Objects for efficiency. The 
lifetime of an item is defined from its removal from a free list to the next time it is placed on a 
possibly different free list. If an item is used by an ECM as a part of an activation record, then 
all subsequent commands are guaranteed to be issued by the same ECM. But if an item is used 
as a hole, during its lifetime, its uid can be sent to different ECM’s or SCM’s. Thus, there must 
be a way to guarantee all commands are received by the same CMND port. Conceptually, the 
CMND port can be different over different lifetimes. But this is difficiitt to implement, since alt 
ECM’s and SCM’s must somehow know the different CMND ports designated to different | 
lifetimes of an item. The simplest way to ensure that all ECM’s and SCM’s send commands on 
an item to the same CMND port is to assign the CMND port staticafly using some function F 
‘from all uid’s to CMND port tdentifiers. “We elaborate on this when we discuss an 


implementation of holes: 


6.2.1 Activation records 
An activation record is a dynamic tree-like structure representing an array such that an 


operand record for an instruction instance (A, i) can be reached. from the root node item A by 
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accessing a set of items using the binary bit representation of the selector i. Each item may 
contain an operand record, or either one or both tuples in { ("0 : ap), (f" : «) }, where ag and 
a, are uid’s. We envision that an operand record:can be stored in an item since we can make 
all actors have a small number of input and output arcs. 
Initially, an activation record consists only of the root node A with a single component 
"text"! The Distribution Network routes a result packet (A, i; k, v, count) to an Execution 
Control Module ECMy4q) determined by some hash function H from uid’s to indices of 
ECM’s. The arrival of the result packet modifies the activation structure A using the bit string 
representation of i by accessing ail items until the operand record is found. If the operand 
record is not in the activation record, the last item on the path of access is modified to include 
the necessary items by acquiring more free restricted items. ‘Thus, the first arriving operand 
always results in allocation of free items, and subsequent arrival of operands to the same 
operand record simply modifies the existing operand record. 
' We now present how reference counts can be used to manage items in an activation 
record: | 
(a) create-activation( P ) 
This operation creates an activation record A whose reference count is one and the 
reference counts of items feading to the “text” component are set to one. The leaf 
item has the uid of the procedure structure P. 
(b) insert( A, i, v ) 
This operation adds one to reference counts of all items teading from the root node 


A to the operand record (A, i). 


1. We assume that the selector “text” can be encoded as a binary bit sal without conflicting 
with integers used for instruction numbers. 
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(c) remove( A, i ) 

This operation is performed by an SCM when it finds an instruction is enabled after 

an insert operation. The operation decrements all reference counts of items leading 

to (A, i) by the value of count. . 

(d) free( A) 

This decrements the reference count of the root node of the activation record by one 

- thus, allowing it and the “text” component to-be garbage collected. 
The scheme maintains the reference count of an item such that it-is equal to the number of 
arrived operands in operand records which are waiting for enabling and can be reached from 
the item. | 

The presentation has been made based.on the assumption that a selector name used in 
each item is a single binary digit "0" or "I". This:makes operations on activation records easier 
to understand, but introduces an appari sAnefficiency that many: items. are- required. to encode 
the instruction number i. Since an activation: record is tikely to be-sparse most of the time, it is 
possible to reduce the number of items used to represent the sparse structure by using prefix 
compression. An example of such a representation of is shown. ia. Figure 63. This added 
S08 on usage of. items -results in faster fnstruction exertion. om the average. While this 
representation using prefix compression: requires a raore.coreplex update operations on items, we 
feel the complexity is justified considering the cost of accessing an items. 
Similarly, we believe prefix compression can be applied profitably to.the representation 


of data structures in general. 


6.2.2 Holes 
The create-hole epenen simply obtains and tags “unfilled” into an item; and the uid 
is marked asa “hole” and returned as its result. If the ‘hole is in: ‘the “unfilled” state, data 
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(a) An activation record not using prefix compression 


Figure 6.3. An example of prefix comp ression 
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structure operations or commands! on a hole that require reading its data are simply stored as a 
pool of items storing these operations. Commands such as reference count updates need not be 
Stored since they do not need to use the data of the hole Since a hole may occur as a 
component of a data structure, a Structure Controller may encounter a hole when processing a 
data structure operation packet. The hole operation pupa port allows a SCM to send a data 
structure spseratlon packet through the Arbitration: Network toa specific scM associated with 
the CMNDFiyiq) port. To guarantee this, the” ‘design of the Arbitration Network is much 
simplified if the function F is implemented in the igre one . 
The reference count processing for restricted. items used for holes is the same as 
reference count accounting for items used in data structures. Note ion Operations pooled for a 
hole should not change the reference count of the item yun the hole s filled. This avoids the 
potential problem that the reference count of a hole ey become ero before these operations 


are processed. 


6.3 Remarks 

We have informally discussed how activation records and holes can be implemented 
using restricted items. This is based on the assumption that the Tiibution Network must 
route all result packets with the destination (A, i) tarthe same. ECM. Thus, all operations on 
restricted items used in the activation record A are guaranteed to be sent W the same CMND 
port. Using this representation, then, a natural optimization is to allocate an activation record 
“close” to the procedure structure or its copies in caches. Similar optimization is possible for 


data structure operations if the Arbitration Network can try to route most data structure 


I. We do no mean commands only here, because holes could be used to hold part of data 
structures on which we want to further perform data structure operations. 
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operation packets on an item to the same SCM if the contention for the same SCM is‘ not 
severe. This optimization will tend to make effective use of the cache memory bandwidth by 
allowing a higher hit rate on the item. 

The question of how far this optimization based on locality of data access should go 
depends on the understanding of program behavior and is a challenging issue. On the other 
hand, for a large procedure, it may create more enabled instructions than a single ECM can 
handle; in this case, a different approach for storing activation records may be devised that 


allows an activation record to be distributed over several ECM’s. 
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Chapter 7. Conclusion — 


Summary 

The expressiveness of a programming language affects not only programming tasks 
but also how the underlying architecture can attain high performance through concurrent 
operation of hardware. We feel that a language based on an applicative style of programming 
is sufficiently expressive for most applications and, augmented with additional features, can 
' provide an approach for structured concurrent programming. That an applicative style of 
programming is preferred is based on the observation that unexpected side-effects greatly 
compromise the confidence in correctness of programs. For applications requiring high 
performance systems, data flow analysis must be performed on programs to reveal the hidden 
concurrency and this analysis is more complicated than necessary because of language features 
based on sequential notion of execution. In this regard, APL has been suggested as a language 
for vector and array processors, because it is more amenable to such analysis. APL, however, is 
limited ‘in its expressiveness because data structures presented in Chapter Two of this thesis 
cannot be easily mapped into arrays. Concurrency is expressed in several ways in the 
value-oriented language that we introduced. Procedure activations allow many activations to be 
simultaneously executed. Streams can be used to — concurrency in computations with a 
Strict ordering on accessing sequences of values. The forall constructs are for explicitly 
specifying concurrent operation on data structures, particularly arrays... 

The implementation of streams can be readity extended to stream of stream and is 
based on the notion of “holes”. Two forms of forall constracts have been defined and can be 
used to express computations on components of data stractures-using associative operations. 
Concurrency expressed in these constructs derives from the property of associativity of 


operations on components of data structures. 
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To show how concurrency in computation can be exploited, we used recursive data 
flow schemas into which a program in the language can be translated. We proposed an 
extended form of data flow processor that implements recursive data flow schemas using 
procedure structures and activation records. These objects are supported by the Packet 
Memory with a multiport and muticache storage structure. A solution is given to the problem 
of maintaining the consistency of reference counts used for memory management; and this 
allows simultaneous accesses to roultipte instances of a data structure: We suggested in Chapter 
Twoas plit-reference-weight scheme of memory management that removes the need for reference 
count updates for each data structure operation. This scheme is of particuar interest when a 
data structure is frequently copied as it is the case in foralf’s. : . 

Data flow architectures differ from conventional concurrent systems particularly 
because concurrency at primitive operation level is easily achieved; and the difficulty of process — 


‘switching in conventional multiprocessor organizations can be avoided. 


Suggestions for further research 
We first discuss language issues: the generality of streams and data structures whose 
components may be aff holes; cycles in data structures and in communication paths between 


ee and ee We then discuss architecture issues. 


Streams and data structures with holes 2 bg 

The concept of streams can be captired:te terms. of lists, queves, and arrays which are 
accessed: in a constrained manner. Streams provide a reasonable abstraction for expressing 
concurrency among cooperating computations, but i requires some-degree of adjustment to 
think in terms of sequences of values. Since the manner in which accesses.to structures are 


constrained may not be immediately obvious to a casual user, it may not be easy. to see when the 
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notion of stream is applicable. © We see many computations, such as the hyperplane 
computation illustrated in Chapter Five, where concurrency is substantially improved if we 
expressed programs using streams. But as the reader may note, it is easier to understand the 
recurrence equation for the computation than to understand the lengthy program using stream 


of stream. Should we provide a compile time translation for such equations? How general can 


‘such translators be? 

If we allow data structures which are accessible when they do not have all of its: 
components, do we need streams? The author's opinion is that streams.can be defined in terms 
of a recursive data type which can be accessed when some of its components may not be 
: available -- using holes. But does use of such data structures cause undesirable situations to 
arise? One can conceive of a situation where the Packet Memory is overloaded with references 
made to components which do not exist yet. How often do theses situations arise? Can one 
control such situations? 

Another issue relates to the general question of defining semantics of aggregates of data 
values such as data structures, streams, and a list of expressions. In this thesis, we assumed that 
all computation terminates and errors in the constituents of an aggregate do not imply the error 
of the whole aggregate. In this view it is desirable that = can define a consistent way of 
dealing with nonterminating computations which supply the component values. In general, it 
may be required to determine when the output value of a nonterminating process is not needed 
sO a computation can be forcibly terminated to avoid wasting computing resources. This can be 
done either continually, periodically or only when resources become scarce. One scheme of 
garbage collecting unwanted processes continually has been proposed by Baker [Baker78]. Can 


and should the scheme be applied to the data flow concept of computation? . 
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Cyclic data structures and communication sheng etoile 

The need for cyctic data: structures and cyclic communication paths between processes 
are actually two separate issues. be 

The need for some representation of conceptual tycles in representation of objects is 
undeniable. But how ate such conceptual stricturesi‘mapped into data structures whose 
operations have no side-effects? Consider the example of a doubly linked list L from which we 
need to delete a node N. There are two ways to represent the tist without side-effects: by using 
immutable cyclic structures based on Henderson's work (Hende?5], or by using an acyclic 
structure. In the scheme using immutable cycles, a deleté ‘operation requires about the same 
number of operations as the number of nodes in the list L, beuitise a new: cyelic structure must 
be constructed to avoid side-effects: Thus the physicat yesenibtance of the immutable cyclic 
structure to conceptual cycles does not imply the conceptisal simplicity of delete operations on 
"such a cycle. For the scheme using acyclic structures, one can see that a delete operation now 
can be performed as a data structure operation which roughly costs fog(fi) operations on items, 
Shere n is the number of nodes in the list L. This observation can be extended to operations 
on graphs of other forms. 

The implementation of procedures as values is related to data structures with cycles 
when we need a mechanism to construct a procedure from existinig ones using binding of 
procedure names to its representation [Hende75]. Using itamutable‘cyctes to represent recursive 
procedures seems natural in that there is no need to introduce the notion of environments in the | 
definition of ‘procedural values. But the operations involving cyclic structures of procedure 
representations will have the same problern as we have discussed previously. 

Many forms of programs are miore naturally expreised as a set of processes 
communicating amongst themselves using cyclic communication paths. Examples are often seen 


in various distributed message passing systems. Constructs of this form are not included in this 
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thesis, because we have not found one that allows deadlock. property to be determined at 
compile time. It may be possible, however, to provide deadlock. detection mechanisms at 
runtime. If the mechanism does not introduce too much overhead for computations that do not 
deadlock, such an approach may be desirable. In addition, it may also detect deadlocks due to 
resource allocation. Much work has been done for deadlock detection of processes due to 
resource allocations. _Not much work, however, can be found in the area of detection of 
processes which are in deadlocks due to either synchronization or message handling. We hope 
further work in this area provides additional insights to the complexity of these deadtock 


detection schemes. 


Nondeterminacy 

In large systems such as data base systems, operating systems, real time contro! systems, 
and point of sale systems, the function of the systems is not necessarily determinate. Often, an 
implementation of such systems must allow some degree of nondeterminacy and possibly tolerate 
temporary inconsistency in their data base to achleve a reasonable performance criteria. The 
nondeterminate merge function that we have introduced in this thesis is inadequate for 


expressing many such forms of nondeterminacy. 


Architecture 
In the architecture we presented, the performance is derived from concurrency on a 
large scale. We made no assumptions about how concurrent operations can be mapped into 
Execution Controllers such that two instructions are located in some close neighborhood to — 
reduce communication delays -- thus improving its performance. | 
Is it possible that heuristics for allocating instructions close to each other can degrade 
the potential setter ianes of the processor due to bad allocation strategies? (Such processors 


must have functional units close to the Execution Controfler Modules and the network 
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structures may be quite different.) It is hard: to evaluated these suggestions without 
understanding both the behavior of programs and the technology of the hardware modules. 
This issue is important because the cost of communication hardware ‘is determined by: 
assumptions about locality of computation. 

The issue of fault-tolerance must be adequately answered for a system such as our data 
flow processor which has a large number of modules. We emphasize that when we are dealing 
with a faulty system some additional operating system furictions for handling fauks may be 
needed. | | | . | 

Ideally, we hope that a system based on data flow concepts can support a community of 
users with the performance that concurrent operation can provide. Such a fra necessarily 
must provide a set of programming languages and various input and “_— functions. In 
addition, it must provide reasonable mechanisms for controling total activities in the system 
such that finite computing resources can be used effectively. — In conventional systems these 
functions are supported by software and ‘explicit machine level primitives. for controlling 
processors. How these functions can be provided on data flow processors isa very interesting 


research issue. 
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Appendix A. Implementation of the n-merge actor 


The imptementation of n-merge actor presented here requires two firings and 
needs an additional input value F which represents the first ciate of the actor. For convenience, 
we use a notation infty, 2:V5,3:F] to mean that an operand: record contains three input values 

| v, at the first input arc, vp at the second input arc, and F at the third input arc for the state. If 
there is no value present for an input arc we use the symbol § in its place. For example, In{t:8, 
2:Vo, 3:8) means only one: itiput has arrived at the operand record. We use a similar notation 
Out(kv,, 2:1, 3:8] to mean that the firing of the acter produces two outputs v;-0n the first output 
arc, fon the second output arc, and no token on the third output arc. 

The enabling count of the actor is defined to be two, thus, the actor is enabled 


with any two of the three inputs. We describe the possible firing by cases: 
(I) In{l:v,, 2:8, 3:F) ; 
The output is Outfl:v,, 21, 3:8] and in addition a result packet containing S, 
representing the second state is sent to the same operand record at the third 
input. Since the only value that has not arrived is vo, the next firing will 
contain Inf{l:8, 2:vo, 3:5) and the result of this firing is Out{I8, 2:8, 3:signal). 


(2) Infl:8, 2:vo, 3:F] 
The output is Outfl:vo, 2:2, 3:8) and in addition a result packet containing S is 
sent to the same operand record at the third input. Since the only value that 
has not arrived is v;, the next firing will contain Infl:v,, 2:8, 3:S) and the result 
of this firing is Out{l8, 2:8, 3:signal). 


(3) InfI:v), 2:vo, 3:8] 
The firing must choose one of the two possible outputs: 
(3a) The output is Out{tv,, 2:1, 3:8) and in addition a result packet 
containing S is sent to the same operand record at the first input. Since 
the only value that has not arrived is F, the next firing will contain Ini{LS, 
28, 3:F] and the resukt of this firing is Our{l4, 2:8, 3:signat). 
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(3b) The output is Outil:vo, 2:2, 34].and im addition a result packet 
containing S is sent to the same operand record at the first input. Since 
the.only value that has not arrived is,F,.the next ficing will.contain In{I:S, 
2:8, 2F] and the result of this firing ts OutlI®, 28, 3:signal). 


The firing rules above does not include the case for all three input. values to be in 
the operand record. This is because the insert operation on. an’ operand: record is implemented 
as a critical region that-allow one insertion to take place. at. time:and an operand: record is 
enabled as soon as two values arrive. Notice. that-each firing: sejll: cause-an instruction fetth, 
and this'is the consequence that we would fike- Execution! Controllers to process’ all. instructions 


in the same manner. 


-153- . 


Appendix B. A cyclic schema for merging two streams 
The schema shown in Figure B has os inputs S; and So each receiving a stream 
represented as a structure, and Out is the output of the sehena: The n-merges actor is enabled. 
as soon as one input arrives and produce two vales: the'stream value arrived on the s output 
arc, and a boolean value on the output A: true if it is the first input, and false if it is the second 
input. The schema uses a false gate F in the model of Dennis and Fosseen to avoid excessive 
_ use of sink actors. The two actors conse and write-hole together form the cons actor introduced 
in Chapter Five. The capitalized fetters at the et each arc implies connections between 
actors to avoid confusion. . | 
| The cyclic schema works by constructing a stream using the conso and the 
write-hole actor for each value of the two input streams. The schema recycles the arrived 
stream ‘structure to the proper input s; or so determined by the boolean output B. The 
- construction of output stream is rather complicated because the whole schema must signal its 
compivon of operation in some manner. And this is achieved. oy sing the signal output of 
the write-hole actor. The schema terminates its operation when one of the input stream is 


empty and this adds additional complexity to the. re diagram. 
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Figure B. A cyclic schema for merging two streams 


